hamishivi commited on
Commit
56aac55
·
verified ·
1 Parent(s): 3a25fd7

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -0
README.md ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # RDS+ Multitask Tulu 2 326k
2
+
3
+ This is a model trained on 326k samples selected by RDS+ for multiple tasks at once from the [Tulu 2 unfiltered dataset](https://huggingface.co/datasets/hamishivi/tulu-2-unfiltered).
4
+ For more details, please see the paper [Practical Large-Scale Data Selection for Instruction Tuning](todo) and [associated codebase](https://github.com/hamishivi/automated-instruction-selection).
5
+
6
+ This model outperforms the original [Tulu 2 SFT checkpoint]() by selecting more targeted data from the same original pool of data.
7
+
8
+
9
+ ## .Model description
10
+
11
+ - **Model type:** A model instruction-tuned on data selected from [Tulu 2 unfiltered](https://huggingface.co/datasets/hamishivi/tulu-2-unfiltered).
12
+ - **Language(s) (NLP):** English
13
+ - **License:** Apache 2.0.
14
+ - **Finetuned from model:** [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)
15
+
16
+ ### Model Sources
17
+
18
+ - **Repository:** https://github.com/hamishivi/automated-instruction-selection
19
+ - **Dataset:** Data used to train this model can be found [here](https://huggingface.co/datasets/hamishivi/rds-sels-multitask-rrmax-top326k).
20
+ - **Model Family:** The collection of related models can be found [here](https://huggingface.co/collections/hamishivi/large-scale-data-selection-for-instruction-tuning-677d7e8ca0295426c1915930).
21
+
22
+ ## Results
23
+
24
+ For more results and analysis, please see [our paper](todo).
25
+
26
+
27
+ | Method | MMLU | GSM8k | BBH | TydiQA | Codex | Squad | AlpacaEval | Average |
28
+ |-----------------------|------:|------:|-----:|-------:|------:|------:|-----------:|--------:|
29
+ | Rand. (unbal) | **52.2** | 18.0 | 44.5 | 55.3 | 25.7 | 81.5 | 33.9 | 44.5 |
30
+ | Rand. (bal) | 51.5 | 21.8 | 45.1 | 50.7 | 32.2 | 87.9 | 43.2 | 47.5 |
31
+ | Top-PPL | 49.1 | 10.5 | 39.4 | 49.4 | 21.6 | 80.3 | 5.6 | 36.6 |
32
+ | Mid-PPL | 53.1 | 13.3 | 42.8 | 52.4 | 20.3 | 86.2 | 20.7 | 41.3 |
33
+ | Embed (GTR) | 49.9 | 32.8 | 44.6 | 54.4 | 30.4 | 88.4 | 35.7 | 48.0 |
34
+ | Embed (NV) | 50.6 | 28.7 | 44.4 | 56.0 | 30.4 | 89.1 | 17.9 | 45.3 |
35
+ | IFD | 45.7 | 21.8 | 41.2 | 39.5 | 27.7 | 17.0 | 57.4 | 35.7 |
36
+ | Tulu 2 | 50.0 | 22.7 | 45.1 | 54.0 | 33.1 | 86.9 | 54.4 | 49.5 |
37
+ | **RDS+ (this model)** | 50.2 | 35.2 | 44.7 | 56.3 | **35.1** | **89.0** | 45.6 | **50.9** |
38
+ | RDS+ - Wildchat | 50.9 | 24.8 | 43.6 | **57.3** | 31.1 | 87.3 | 39.3 | 47.8 |
39
+ | RDS+ - Arena Hard | 48.1 | **36.2** | 43.9 | 51.8 | 31.8 | 81.3 | **59.4** | 50.4 |
40
+
41
+ ## Input Format
42
+
43
+ The model is trained to use the following format (note the newlines):
44
+ ```
45
+ <|user|>
46
+ Your message here!
47
+ <|assistant|>
48
+ ```
49
+
50
+ For best results, format all inputs in this manner. **Make sure to include a newline after `<|assistant|>`, this can affect generation quality quite a bit.**
51
+ We have included a [chat template](https://huggingface.co/docs/transformers/main/en/chat_templating) in the tokenizer implementing this template.
52
+
53
+ ## Bias, Risks, and Limitations
54
+
55
+ The Tulu models have not been aligned to generate safe completions within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
56
+ It is also unknown what the size and composition of the corpus was used to train the base Llama 2 models, however it is likely to have included a mix of Web data and technical sources like books and code. See the [Falcon 180B model card](https://huggingface.co/tiiuae/falcon-180B#training-data) for an example of this.
57
+
58
+
59
+ ### Training hyperparameters
60
+
61
+ The following hyperparameters were used during PPO training:
62
+ - learning_rate: 2e-05
63
+ - total_train_batch_size: 128
64
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
65
+ - lr_scheduler_type: linear
66
+ - lr_scheduler_warmup_ratio: 0.03
67
+ - num_epochs: 2.0
68
+
69
+ ## Citation
70
+
71
+ If you find this model or data is useful in your work, please cite it with:
72
+
73
+ ```
74
+ @misc{ivison2025data,
75
+ title={{Practical Large-Scale Data Selection for Instruction Tuning}},
76
+ author={{Hamish Ivison and Muru Zhang and Faeze Brahman and Pang Wei Koh and Pradeep Dasigi}}
77
+ year={2025},
78
+ eprint={todo},
79
+ archivePrefix={arXiv},
80
+ primaryClass={cs.CL}
81
+ }
82
+ ```