File size: 3,708 Bytes
5ddb30d e202198 5ddb30d e202198 5ddb30d e202198 5ddb30d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
---
datasets:
- hamishivi/rds-sels-tulu-3-multitask-rrmax-939k
language:
- en
base_model:
- meta-llama/Llama-3.1-8B
license: llama3.1
---
# Llama 3.1 RDS+ Tulu 3 Multitask 326k
This is a model trained on 939k samples selected by RDS+ from the [Tulu 3 unfiltered dataset](https://huggingface.co/datasets/hamishivi/tulu-3-unfiltered).
For more details, please see the paper [Practical Large-Scale Data Selection for Instruction Tuning](https://arxiv.org/abs/2503.01807) and [associated codebase](https://github.com/hamishivi/automated-instruction-selection).
<center>
<img src="https://huggingface.co/hamishivi/tulu-2-multitask-rrmax-326k-sft/resolve/main/image.png" alt="Practical Large-Scale Data Selection for Instruction Tuning logo" width="200px"/>
</center>
## .Model description
- **Model type:** A model instruction-tuned on data selected from [Tulu 3 unfiltered](https://huggingface.co/datasets/hamishivi/tulu-3-unfiltered).
- **Language(s) (NLP):** English
- **License:** Llama 3.1 Community License Agreement
- **Finetuned from model:** [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B)
### Model Sources
- **Repository:** https://github.com/hamishivi/automated-instruction-selection
- **Dataset:** Data used to train this model can be found [here](https://huggingface.co/datasets/hamishivi/rds-sels-tulu-3-multitask-rrmax-939k).
- **Model Family:** The collection of related models can be found [here](https://huggingface.co/collections/hamishivi/large-scale-data-selection-for-instruction-tuning-677d7e8ca0295426c1915930).
## Results
For more results and analysis, please see [our paper](https://arxiv.org/abs/2503.01807).
| Method | MMLU | GSM8k | BBH | TydiQA | Codex | Squad | AlpacaEval | Average |
|-----------------------|------:|------:|-----:|-------:|------:|------:|-----------:|--------:|
| Random (unbal.) | 61.6 | **81.2** | 66.8 | 71.1 | 76.4 | 89.7 | 75.6 | 74.6 |
| Random (bal.) | 62.1 | 76.0 | 68.6 | 68.8 | **87.2** | 87.4 | 72.4 | 74.7 |
| Tulu 3 SFT | 62.2 | 74.3 | **68.2** | 67.4 | 83.8 | 85.5 | 71.9 | 73.3 |
| **RDS+ (this model)** | **62.5** | 77.6 | 66.6 | **72.1** | 83.8 | **90.2** | 80.2 | **76.1** |
| RDS+ - Arena Hard | 57.0 | 78.7 | 59.7 | 49.4 | 75.7 | 66.3 | **84.5** | 67.3 |
## Input Format
The model is trained to use the following format (note the newlines):
```
<|user|>
Your message here!
<|assistant|>
```
For best results, format all inputs in this manner. **Make sure to include a newline after `<|assistant|>`, this can affect generation quality quite a bit.**
We have included a [chat template](https://huggingface.co/docs/transformers/main/en/chat_templating) in the tokenizer implementing this template.
## Bias, Risks, and Limitations
The Tulu models have not been aligned to generate safe completions, so the model can produce problematic outputs (especially when prompted to do so).
### Training hyperparameters
- **Learning Rate**: 5E-6
- **Effective Batch Size:** 128
- **Max. Sequence Length:** 4096
- **Loss Accumulation:** Sum (see https://unsloth.ai/blog/gradient)
- **Learning Rate Schedule:** Linear
- **LR Warmup Ratio:** 0.03
- **Num. Epochs:** 2
## Citation
If you find this model or data is useful in your work, please cite it with:
```
@misc{ivison2025data,
title={{Practical Large-Scale Data Selection for Instruction Tuning}},
author={{Hamish Ivison and Muru Zhang and Faeze Brahman and Pang Wei Koh and Pradeep Dasigi}}
year={2025},
url={https://arxiv.org/abs/2503.01807},
eprint={2503.01807},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
``` |