SELM-Zephyr
Collection
See our paper at https://huggingface.co/papers/2405.19332.
•
5 items
•
Updated
•
1
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment.
This model is a fine-tuned version of HuggingFaceH4/mistral-7b-sft-beta using synthetic data based on on the HuggingFaceH4/ultrafeedback_binarized dataset.
AlpacaEval 2.0 (LC WR) | MT-Bench (Average) | |
---|---|---|
SELM-Zephyr-7B-iter-3 |        24.00 |       7.48 |
SELM-Zephyr-7B-iter-2 |        23.40 |       7.72 |
SELM-Zephyr-7B-iter-1 |        20.28 |       7.42 |
DPO-Zephyr-7B |        14.45 |       7.28 |
The following hyperparameters were used during training: