Update README.md

e202198 verified about 2 months ago

3.71 kB

	---
	datasets:
	- hamishivi/rds-sels-tulu-3-multitask-rrmax-939k
	language:
	- en
	base_model:
	- meta-llama/Llama-3.1-8B
	license: llama3.1
	---
	# Llama 3.1 RDS+ Tulu 3 Multitask 326k

	This is a model trained on 939k samples selected by RDS+ from the [Tulu 3 unfiltered dataset](https://huggingface.co/datasets/hamishivi/tulu-3-unfiltered).
	For more details, please see the paper [Practical Large-Scale Data Selection for Instruction Tuning](https://arxiv.org/abs/2503.01807) and [associated codebase](https://github.com/hamishivi/automated-instruction-selection).

	<center>
	<img src="https://huggingface.co/hamishivi/tulu-2-multitask-rrmax-326k-sft/resolve/main/image.png" alt="Practical Large-Scale Data Selection for Instruction Tuning logo" width="200px"/>
	</center>

	## .Model description

	- Model type: A model instruction-tuned on data selected from [Tulu 3 unfiltered](https://huggingface.co/datasets/hamishivi/tulu-3-unfiltered).
	- Language(s) (NLP): English
	- License: Llama 3.1 Community License Agreement
	- Finetuned from model: [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B)

	### Model Sources

	- Repository: https://github.com/hamishivi/automated-instruction-selection
	- Dataset: Data used to train this model can be found [here](https://huggingface.co/datasets/hamishivi/rds-sels-tulu-3-multitask-rrmax-939k).
	- Model Family: The collection of related models can be found [here](https://huggingface.co/collections/hamishivi/large-scale-data-selection-for-instruction-tuning-677d7e8ca0295426c1915930).

	## Results

	For more results and analysis, please see [our paper](https://arxiv.org/abs/2503.01807).

	\| Method \| MMLU \| GSM8k \| BBH \| TydiQA \| Codex \| Squad \| AlpacaEval \| Average \|
	\|-----------------------\|------:\|------:\|-----:\|-------:\|------:\|------:\|-----------:\|--------:\|
	\| Random (unbal.) \| 61.6 \| 81.2 \| 66.8 \| 71.1 \| 76.4 \| 89.7 \| 75.6 \| 74.6 \|
	\| Random (bal.) \| 62.1 \| 76.0 \| 68.6 \| 68.8 \| 87.2 \| 87.4 \| 72.4 \| 74.7 \|
	\| Tulu 3 SFT \| 62.2 \| 74.3 \| 68.2 \| 67.4 \| 83.8 \| 85.5 \| 71.9 \| 73.3 \|
	\| RDS+ (this model) \| 62.5 \| 77.6 \| 66.6 \| 72.1 \| 83.8 \| 90.2 \| 80.2 \| 76.1 \|
	\| RDS+ - Arena Hard \| 57.0 \| 78.7 \| 59.7 \| 49.4 \| 75.7 \| 66.3 \| 84.5 \| 67.3 \|

	## Input Format

	The model is trained to use the following format (note the newlines):
	```
	<\|user\|>
	Your message here!
	<\|assistant\|>
	```

	For best results, format all inputs in this manner. Make sure to include a newline after `<\|assistant\|>`, this can affect generation quality quite a bit.
	We have included a [chat template](https://huggingface.co/docs/transformers/main/en/chat_templating) in the tokenizer implementing this template.

	## Bias, Risks, and Limitations

	The Tulu models have not been aligned to generate safe completions, so the model can produce problematic outputs (especially when prompted to do so).


	### Training hyperparameters

	- Learning Rate: 5E-6
	- Effective Batch Size: 128
	- Max. Sequence Length: 4096
	- Loss Accumulation: Sum (see https://unsloth.ai/blog/gradient)
	- Learning Rate Schedule: Linear
	- LR Warmup Ratio: 0.03
	- Num. Epochs: 2

	## Citation

	If you find this model or data is useful in your work, please cite it with:

	```
	@misc{ivison2025data,
	title={{Practical Large-Scale Data Selection for Instruction Tuning}},
	author={{Hamish Ivison and Muru Zhang and Faeze Brahman and Pang Wei Koh and Pradeep Dasigi}}
	year={2025},
	url={https://arxiv.org/abs/2503.01807},
	eprint={2503.01807},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```