LatitudeGames
/

Muse-12B

Text Generation

text-generation-inference

Model card Files Files and versions Community

Gryphe commited on 6 days ago

Commit

8dff223

·

verified ·

1 Parent(s): b0d5a9a

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -26,7 +26,7 @@ We plan to continue improving and open-sourcing similar models, so please share
 Muse 12B was trained using Mistral Nemo 12B as its foundation, with training occurring in three stages: SFT (supervised fine-tuning), followed by two distinct DPO (direct preference optimization) phases.
-**SFT** - Various multi-turn datasets from a multitude of sources, combining text adventures of the kind used to finetune our Wayfarer 12B model, long emotional narratives and general roleplay, each carefully balanced and rewritten to be free of common AI cliches. A small single-turn instruct dataset was included to send a stronger signal during finetuning.
 **DPO 1** - Gutenberg DPO, [credit to Jon Durbin](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1) - This stage introduces human writing techniques, significantly enhancing the model's potential outputs, albeit trading some intelligence for the stylistic benefits of human-created text.

 Muse 12B was trained using Mistral Nemo 12B as its foundation, with training occurring in three stages: SFT (supervised fine-tuning), followed by two distinct DPO (direct preference optimization) phases.
+**SFT** - Various multi-turn datasets from a multitude of sources, combining text adventures of the kind used to finetune [our Wayfarer 12B model](https://huggingface.co/LatitudeGames/Wayfarer-12B), long emotional narratives and general roleplay, each carefully balanced and rewritten to be free of common AI cliches. A small single-turn instruct dataset was included to send a stronger signal during finetuning.
 **DPO 1** - Gutenberg DPO, [credit to Jon Durbin](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1) - This stage introduces human writing techniques, significantly enhancing the model's potential outputs, albeit trading some intelligence for the stylistic benefits of human-created text.