---
datasets:
- hamishivi/rds-sels-tulu-3-multitask-rrmax-939k
language:
- en
base_model:
- meta-llama/Llama-3.1-8B
license: llama3.1
---
# Llama 3.1 RDS+ Tulu 3 Multitask 326k

This is a model trained on 939k samples selected by RDS+ from the [Tulu 3 unfiltered dataset](https://huggingface.co/datasets/hamishivi/tulu-3-unfiltered).
For more details, please see the paper [Practical Large-Scale Data Selection for Instruction Tuning](https://arxiv.org/abs/2503.01807) and [associated codebase](https://github.com/hamishivi/automated-instruction-selection).

<center>
<img src="https://huggingface.co/hamishivi/tulu-2-multitask-rrmax-326k-sft/resolve/main/image.png" alt="Practical Large-Scale Data Selection for Instruction Tuning logo" width="200px"/>
</center>

## .Model description

- **Model type:** A model instruction-tuned on data selected from [Tulu 3 unfiltered](https://huggingface.co/datasets/hamishivi/tulu-3-unfiltered).
- **Language(s) (NLP):** English
- **License:** Llama 3.1 Community License Agreement
- **Finetuned from model:** [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B)

### Model Sources

- **Repository:** https://github.com/hamishivi/automated-instruction-selection
- **Dataset:** Data used to train this model can be found [here](https://huggingface.co/datasets/hamishivi/rds-sels-tulu-3-multitask-rrmax-939k).
- **Model Family:** The collection of related models can be found [here](https://huggingface.co/collections/hamishivi/large-scale-data-selection-for-instruction-tuning-677d7e8ca0295426c1915930).

## Results

For more results and analysis, please see [our paper](https://arxiv.org/abs/2503.01807).

| Method                | MMLU  | GSM8k | BBH  | TydiQA | Codex | Squad | AlpacaEval | Average |
|-----------------------|------:|------:|-----:|-------:|------:|------:|-----------:|--------:|
| Random (unbal.)  | 61.6  | **81.2** | 66.8 | 71.1   | 76.4  | 89.7  | 75.6       | 74.6    |
| Random (bal.)    | 62.1  | 76.0  | 68.6 | 68.8   | **87.2**  | 87.4  | 72.4       | 74.7    |
| Tulu 3 SFT      | 62.2  | 74.3  | **68.2** | 67.4   | 83.8  | 85.5  | 71.9       | 73.3    |
| **RDS+ (this model)**           | **62.5** | 77.6  | 66.6 | **72.1** | 83.8  | **90.2** | 80.2       | **76.1** |
| RDS+ - Arena Hard | 57.0  | 78.7  | 59.7 | 49.4   | 75.7  | 66.3  | **84.5**  | 67.3    |

## Input Format

The model is trained to use the following format (note the newlines):
```
<|user|>
Your message here!
<|assistant|>
```

For best results, format all inputs in this manner. **Make sure to include a newline after `<|assistant|>`, this can affect generation quality quite a bit.**
We have included a [chat template](https://huggingface.co/docs/transformers/main/en/chat_templating) in the tokenizer implementing this template.

## Bias, Risks, and Limitations

The Tulu models have not been aligned to generate safe completions, so the model can produce problematic outputs (especially when prompted to do so). 


### Training hyperparameters

- **Learning Rate**: 5E-6
- **Effective Batch Size:** 128
- **Max. Sequence Length:** 4096
- **Loss Accumulation:** Sum (see https://unsloth.ai/blog/gradient)
- **Learning Rate Schedule:** Linear
- **LR Warmup Ratio:** 0.03
- **Num. Epochs:** 2

## Citation

If you find this model or data is useful in your work, please cite it with:

```
@misc{ivison2025data,
      title={{Practical Large-Scale Data Selection for Instruction Tuning}}, 
      author={{Hamish Ivison and Muru Zhang and Faeze Brahman and Pang Wei Koh and Pradeep Dasigi}}
      year={2025},
      url={https://arxiv.org/abs/2503.01807},
      eprint={2503.01807},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```