|
--- |
|
datasets: |
|
- hamishivi/rds-sels-tulu-3-multitask-rrmax-939k |
|
language: |
|
- en |
|
base_model: |
|
- meta-llama/Llama-3.1-8B |
|
license: llama3.1 |
|
--- |
|
# Llama 3.1 RDS+ Tulu 3 Multitask 326k |
|
|
|
This is a model trained on 939k samples selected by RDS+ from the [Tulu 3 unfiltered dataset](https://huggingface.co/datasets/hamishivi/tulu-3-unfiltered). |
|
For more details, please see the paper [Practical Large-Scale Data Selection for Instruction Tuning](https://arxiv.org/abs/2503.01807) and [associated codebase](https://github.com/hamishivi/automated-instruction-selection). |
|
|
|
<center> |
|
<img src="https://huggingface.co/hamishivi/tulu-2-multitask-rrmax-326k-sft/resolve/main/image.png" alt="Practical Large-Scale Data Selection for Instruction Tuning logo" width="200px"/> |
|
</center> |
|
|
|
## .Model description |
|
|
|
- **Model type:** A model instruction-tuned on data selected from [Tulu 3 unfiltered](https://huggingface.co/datasets/hamishivi/tulu-3-unfiltered). |
|
- **Language(s) (NLP):** English |
|
- **License:** Llama 3.1 Community License Agreement |
|
- **Finetuned from model:** [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) |
|
|
|
### Model Sources |
|
|
|
- **Repository:** https://github.com/hamishivi/automated-instruction-selection |
|
- **Dataset:** Data used to train this model can be found [here](https://huggingface.co/datasets/hamishivi/rds-sels-tulu-3-multitask-rrmax-939k). |
|
- **Model Family:** The collection of related models can be found [here](https://huggingface.co/collections/hamishivi/large-scale-data-selection-for-instruction-tuning-677d7e8ca0295426c1915930). |
|
|
|
## Results |
|
|
|
For more results and analysis, please see [our paper](https://arxiv.org/abs/2503.01807). |
|
|
|
| Method | MMLU | GSM8k | BBH | TydiQA | Codex | Squad | AlpacaEval | Average | |
|
|-----------------------|------:|------:|-----:|-------:|------:|------:|-----------:|--------:| |
|
| Random (unbal.) | 61.6 | **81.2** | 66.8 | 71.1 | 76.4 | 89.7 | 75.6 | 74.6 | |
|
| Random (bal.) | 62.1 | 76.0 | 68.6 | 68.8 | **87.2** | 87.4 | 72.4 | 74.7 | |
|
| Tulu 3 SFT | 62.2 | 74.3 | **68.2** | 67.4 | 83.8 | 85.5 | 71.9 | 73.3 | |
|
| **RDS+ (this model)** | **62.5** | 77.6 | 66.6 | **72.1** | 83.8 | **90.2** | 80.2 | **76.1** | |
|
| RDS+ - Arena Hard | 57.0 | 78.7 | 59.7 | 49.4 | 75.7 | 66.3 | **84.5** | 67.3 | |
|
|
|
## Input Format |
|
|
|
The model is trained to use the following format (note the newlines): |
|
``` |
|
<|user|> |
|
Your message here! |
|
<|assistant|> |
|
``` |
|
|
|
For best results, format all inputs in this manner. **Make sure to include a newline after `<|assistant|>`, this can affect generation quality quite a bit.** |
|
We have included a [chat template](https://huggingface.co/docs/transformers/main/en/chat_templating) in the tokenizer implementing this template. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
The Tulu models have not been aligned to generate safe completions, so the model can produce problematic outputs (especially when prompted to do so). |
|
|
|
|
|
### Training hyperparameters |
|
|
|
- **Learning Rate**: 5E-6 |
|
- **Effective Batch Size:** 128 |
|
- **Max. Sequence Length:** 4096 |
|
- **Loss Accumulation:** Sum (see https://unsloth.ai/blog/gradient) |
|
- **Learning Rate Schedule:** Linear |
|
- **LR Warmup Ratio:** 0.03 |
|
- **Num. Epochs:** 2 |
|
|
|
## Citation |
|
|
|
If you find this model or data is useful in your work, please cite it with: |
|
|
|
``` |
|
@misc{ivison2025data, |
|
title={{Practical Large-Scale Data Selection for Instruction Tuning}}, |
|
author={{Hamish Ivison and Muru Zhang and Faeze Brahman and Pang Wei Koh and Pradeep Dasigi}} |
|
year={2025}, |
|
url={https://arxiv.org/abs/2503.01807}, |
|
eprint={2503.01807}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |