hamishivi
/

tulu-2-multitask-rrmax-326k-sft

Safetensors

English

llama

Model card Files Files and versions Community

hamishivi commited on Feb 28

Commit

56aac55

verified ·

1 Parent(s): 3a25fd7

Create README.md

Browse files

Files changed (1) hide show

README.md +82 -0

README.md ADDED Viewed

	@@ -0,0 +1,82 @@

+# RDS+ Multitask Tulu 2 326k
+This is a model trained on 326k samples selected by RDS+ for multiple tasks at once from the [Tulu 2 unfiltered dataset](https://huggingface.co/datasets/hamishivi/tulu-2-unfiltered).
+For more details, please see the paper [Practical Large-Scale Data Selection for Instruction Tuning](todo) and [associated codebase](https://github.com/hamishivi/automated-instruction-selection).
+This model outperforms the original [Tulu 2 SFT checkpoint]() by selecting more targeted data from the same original pool of data.
+## .Model description
+- **Model type:** A model instruction-tuned on data selected from [Tulu 2 unfiltered](https://huggingface.co/datasets/hamishivi/tulu-2-unfiltered).
+- **Language(s) (NLP):** English
+- **License:** Apache 2.0.
+- **Finetuned from model:** [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)
+### Model Sources
+- **Repository:** https://github.com/hamishivi/automated-instruction-selection
+- **Dataset:** Data used to train this model can be found [here](https://huggingface.co/datasets/hamishivi/rds-sels-multitask-rrmax-top326k).
+- **Model Family:** The collection of related models can be found [here](https://huggingface.co/collections/hamishivi/large-scale-data-selection-for-instruction-tuning-677d7e8ca0295426c1915930).
+## Results
+For more results and analysis, please see [our paper](todo).
+| Method                | MMLU  | GSM8k | BBH  | TydiQA | Codex | Squad | AlpacaEval | Average |
+|-----------------------|------:|------:|-----:|-------:|------:|------:|-----------:|--------:|
+| Rand. (unbal)    | **52.2** | 18.0  | 44.5 | 55.3   | 25.7  | 81.5  | 33.9       | 44.5    |
+| Rand. (bal)      | 51.5  | 21.8  | 45.1 | 50.7   | 32.2  | 87.9  | 43.2       | 47.5    |
+| Top-PPL         | 49.1  | 10.5  | 39.4 | 49.4   | 21.6  | 80.3  | 5.6        | 36.6    |
+| Mid-PPL         | 53.1  | 13.3  | 42.8 | 52.4   | 20.3  | 86.2  | 20.7       | 41.3    |
+| Embed (GTR)     | 49.9  | 32.8  | 44.6 | 54.4   | 30.4  | 88.4  | 35.7       | 48.0    |
+| Embed (NV)      | 50.6  | 28.7  | 44.4 | 56.0   | 30.4  | 89.1  | 17.9       | 45.3    |
+| IFD             | 45.7  | 21.8  | 41.2 | 39.5   | 27.7  | 17.0  | 57.4       | 35.7    |
+| Tulu 2       | 50.0  | 22.7  | 45.1 | 54.0   | 33.1  | 86.9  | 54.4       | 49.5    |
+| **RDS+ (this model)**           | 50.2  | 35.2  | 44.7 | 56.3   | **35.1** | **89.0** | 45.6       | **50.9** |
+| RDS+ - Wildchat | 50.9  | 24.8  | 43.6 | **57.3** | 31.1  | 87.3  | 39.3       | 47.8    |
+| RDS+ - Arena Hard | 48.1  | **36.2** | 43.9 | 51.8   | 31.8  | 81.3  | **59.4**  | 50.4    |
+## Input Format
+The model is trained to use the following format (note the newlines):
+```
+<|user|>
+Your message here!
+<|assistant|>
+```
+For best results, format all inputs in this manner. **Make sure to include a newline after `<|assistant|>`, this can affect generation quality quite a bit.**
+We have included a [chat template](https://huggingface.co/docs/transformers/main/en/chat_templating) in the tokenizer implementing this template.
+## Bias, Risks, and Limitations
+The Tulu models have not been aligned to generate safe completions within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
+It is also unknown what the size and composition of the corpus was used to train the base Llama 2 models, however it is likely to have included a mix of Web data and technical sources like books and code. See the [Falcon 180B model card](https://huggingface.co/tiiuae/falcon-180B#training-data) for an example of this.
+### Training hyperparameters
+The following hyperparameters were used during PPO training:
+- learning_rate: 2e-05
+- total_train_batch_size: 128
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_ratio: 0.03
+- num_epochs: 2.0
+## Citation
+If you find this model or data is useful in your work, please cite it with:
+```
+@misc{ivison2025data,
+      title={{Practical Large-Scale Data Selection for Instruction Tuning}},
+      author={{Hamish Ivison and Muru Zhang and Faeze Brahman and Pang Wei Koh and Pradeep Dasigi}}
+      year={2025},
+      eprint={todo},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```