tulu-2-multitask-rrmax-326k-sft / README.md

Update README.md

db14560 verified about 2 months ago

4.49 kB

	---
	datasets:
	- hamishivi/rds-sels-multitask-rrmax-top326k
	language:
	- en
	base_model:
	- meta-llama/Llama-2-7b-hf
	---
	# RDS+ Multitask Tulu 2 326k

	This is a model trained on 326k samples selected by RDS+ for multiple tasks at once from the [Tulu 2 unfiltered dataset](https://huggingface.co/datasets/hamishivi/tulu-2-unfiltered).
	For more details, please see the paper [Practical Large-Scale Data Selection for Instruction Tuning](https://arxiv.org/abs/2503.01807) and [associated codebase](https://github.com/hamishivi/automated-instruction-selection).

	This model outperforms the original [Tulu 2 SFT model](https://huggingface.co/allenai/tulu-2-7b) by selecting more targeted data from the same original pool of data.

	<center>
	<img src="https://huggingface.co/hamishivi/tulu-2-multitask-rrmax-326k-sft/resolve/main/image.png" alt="Practical Large-Scale Data Selection for Instruction Tuning logo" width="200px"/>
	</center>

	## .Model description

	- Model type: A model instruction-tuned on data selected from [Tulu 2 unfiltered](https://huggingface.co/datasets/hamishivi/tulu-2-unfiltered).
	- Language(s) (NLP): English
	- License: Llama 2 models are licensed under the Llama 2 license. A copy of this and a notice file can be found in this repository.
	- Finetuned from model: [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)

	### Model Sources

	- Repository: https://github.com/hamishivi/automated-instruction-selection
	- Dataset: Data used to train this model can be found [here](https://huggingface.co/datasets/hamishivi/rds-sels-multitask-rrmax-top326k).
	- Model Family: The collection of related models can be found [here](https://huggingface.co/collections/hamishivi/large-scale-data-selection-for-instruction-tuning-677d7e8ca0295426c1915930).

	## Results

	For more results and analysis, please see [our paper](https://arxiv.org/abs/2503.01807).


	\| Method \| MMLU \| GSM8k \| BBH \| TydiQA \| Codex \| Squad \| AlpacaEval \| Average \|
	\|-----------------------\|------:\|------:\|-----:\|-------:\|------:\|------:\|-----------:\|--------:\|
	\| Rand. (unbal) \| 52.2 \| 18.0 \| 44.5 \| 55.3 \| 25.7 \| 81.5 \| 33.9 \| 44.5 \|
	\| Rand. (bal) \| 51.5 \| 21.8 \| 45.1 \| 50.7 \| 32.2 \| 87.9 \| 43.2 \| 47.5 \|
	\| Top-PPL \| 49.1 \| 10.5 \| 39.4 \| 49.4 \| 21.6 \| 80.3 \| 5.6 \| 36.6 \|
	\| Mid-PPL \| 53.1 \| 13.3 \| 42.8 \| 52.4 \| 20.3 \| 86.2 \| 20.7 \| 41.3 \|
	\| Embed (GTR) \| 49.9 \| 32.8 \| 44.6 \| 54.4 \| 30.4 \| 88.4 \| 35.7 \| 48.0 \|
	\| Embed (NV) \| 50.6 \| 28.7 \| 44.4 \| 56.0 \| 30.4 \| 89.1 \| 17.9 \| 45.3 \|
	\| IFD \| 45.7 \| 21.8 \| 41.2 \| 39.5 \| 27.7 \| 17.0 \| 57.4 \| 35.7 \|
	\| Tulu 2 \| 50.0 \| 22.7 \| 45.1 \| 54.0 \| 33.1 \| 86.9 \| 54.4 \| 49.5 \|
	\| RDS+ (this model) \| 50.2 \| 35.2 \| 44.7 \| 56.3 \| 35.1 \| 89.0 \| 45.6 \| 50.9 \|
	\| RDS+ - Wildchat \| 50.9 \| 24.8 \| 43.6 \| 57.3 \| 31.1 \| 87.3 \| 39.3 \| 47.8 \|
	\| RDS+ - Arena Hard \| 48.1 \| 36.2 \| 43.9 \| 51.8 \| 31.8 \| 81.3 \| 59.4 \| 50.4 \|

	## Input Format

	The model is trained to use the following format (note the newlines):
	```
	<\|user\|>
	Your message here!
	<\|assistant\|>
	```

	For best results, format all inputs in this manner. Make sure to include a newline after `<\|assistant\|>`, this can affect generation quality quite a bit.
	We have included a [chat template](https://huggingface.co/docs/transformers/main/en/chat_templating) in the tokenizer implementing this template.

	## Bias, Risks, and Limitations

	These models have not been aligned to generate safe completions, so the model can produce problematic outputs (especially when prompted to do so).

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- total_train_batch_size: 128
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.03
	- num_epochs: 2.0

	## Citation

	If you find this model or data is useful in your work, please cite it with:

	```
	@misc{ivison2025data,
	title={{Practical Large-Scale Data Selection for Instruction Tuning}},
	author={{Hamish Ivison and Muru Zhang and Faeze Brahman and Pang Wei Koh and Pradeep Dasigi}}
	year={2025},
	eprint={2503.01807},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2503.01807}
	}
	```