--- datasets: - hamishivi/rds-sels-tulu-3-multitask-rrmax-939k language: - en base_model: - meta-llama/Llama-3.1-8B license: llama3.1 --- # Llama 3.1 RDS+ Tulu 3 Multitask 326k This is a model trained on 939k samples selected by RDS+ from the [Tulu 3 unfiltered dataset](https://huggingface.co/datasets/hamishivi/tulu-3-unfiltered). For more details, please see the paper [Practical Large-Scale Data Selection for Instruction Tuning](https://arxiv.org/abs/2503.01807) and [associated codebase](https://github.com/hamishivi/automated-instruction-selection).
Practical Large-Scale Data Selection for Instruction Tuning logo
## .Model description - **Model type:** A model instruction-tuned on data selected from [Tulu 3 unfiltered](https://huggingface.co/datasets/hamishivi/tulu-3-unfiltered). - **Language(s) (NLP):** English - **License:** Llama 3.1 Community License Agreement - **Finetuned from model:** [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) ### Model Sources - **Repository:** https://github.com/hamishivi/automated-instruction-selection - **Dataset:** Data used to train this model can be found [here](https://huggingface.co/datasets/hamishivi/rds-sels-tulu-3-multitask-rrmax-939k). - **Model Family:** The collection of related models can be found [here](https://huggingface.co/collections/hamishivi/large-scale-data-selection-for-instruction-tuning-677d7e8ca0295426c1915930). ## Results For more results and analysis, please see [our paper](https://arxiv.org/abs/2503.01807). | Method | MMLU | GSM8k | BBH | TydiQA | Codex | Squad | AlpacaEval | Average | |-----------------------|------:|------:|-----:|-------:|------:|------:|-----------:|--------:| | Random (unbal.) | 61.6 | **81.2** | 66.8 | 71.1 | 76.4 | 89.7 | 75.6 | 74.6 | | Random (bal.) | 62.1 | 76.0 | 68.6 | 68.8 | **87.2** | 87.4 | 72.4 | 74.7 | | Tulu 3 SFT | 62.2 | 74.3 | **68.2** | 67.4 | 83.8 | 85.5 | 71.9 | 73.3 | | **RDS+ (this model)** | **62.5** | 77.6 | 66.6 | **72.1** | 83.8 | **90.2** | 80.2 | **76.1** | | RDS+ - Arena Hard | 57.0 | 78.7 | 59.7 | 49.4 | 75.7 | 66.3 | **84.5** | 67.3 | ## Input Format The model is trained to use the following format (note the newlines): ``` <|user|> Your message here! <|assistant|> ``` For best results, format all inputs in this manner. **Make sure to include a newline after `<|assistant|>`, this can affect generation quality quite a bit.** We have included a [chat template](https://huggingface.co/docs/transformers/main/en/chat_templating) in the tokenizer implementing this template. ## Bias, Risks, and Limitations The Tulu models have not been aligned to generate safe completions, so the model can produce problematic outputs (especially when prompted to do so). ### Training hyperparameters - **Learning Rate**: 5E-6 - **Effective Batch Size:** 128 - **Max. Sequence Length:** 4096 - **Loss Accumulation:** Sum (see https://unsloth.ai/blog/gradient) - **Learning Rate Schedule:** Linear - **LR Warmup Ratio:** 0.03 - **Num. Epochs:** 2 ## Citation If you find this model or data is useful in your work, please cite it with: ``` @misc{ivison2025data, title={{Practical Large-Scale Data Selection for Instruction Tuning}}, author={{Hamish Ivison and Muru Zhang and Faeze Brahman and Pang Wei Koh and Pradeep Dasigi}} year={2025}, url={https://arxiv.org/abs/2503.01807}, eprint={2503.01807}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```