Gryphe commited on
Commit
8dff223
·
verified ·
1 Parent(s): b0d5a9a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -26,7 +26,7 @@ We plan to continue improving and open-sourcing similar models, so please share
26
 
27
  Muse 12B was trained using Mistral Nemo 12B as its foundation, with training occurring in three stages: SFT (supervised fine-tuning), followed by two distinct DPO (direct preference optimization) phases.
28
 
29
- **SFT** - Various multi-turn datasets from a multitude of sources, combining text adventures of the kind used to finetune our Wayfarer 12B model, long emotional narratives and general roleplay, each carefully balanced and rewritten to be free of common AI cliches. A small single-turn instruct dataset was included to send a stronger signal during finetuning.
30
 
31
  **DPO 1** - Gutenberg DPO, [credit to Jon Durbin](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1) - This stage introduces human writing techniques, significantly enhancing the model's potential outputs, albeit trading some intelligence for the stylistic benefits of human-created text.
32
 
 
26
 
27
  Muse 12B was trained using Mistral Nemo 12B as its foundation, with training occurring in three stages: SFT (supervised fine-tuning), followed by two distinct DPO (direct preference optimization) phases.
28
 
29
+ **SFT** - Various multi-turn datasets from a multitude of sources, combining text adventures of the kind used to finetune [our Wayfarer 12B model](https://huggingface.co/LatitudeGames/Wayfarer-12B), long emotional narratives and general roleplay, each carefully balanced and rewritten to be free of common AI cliches. A small single-turn instruct dataset was included to send a stronger signal during finetuning.
30
 
31
  **DPO 1** - Gutenberg DPO, [credit to Jon Durbin](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1) - This stage introduces human writing techniques, significantly enhancing the model's potential outputs, albeit trading some intelligence for the stylistic benefits of human-created text.
32