File size: 3,295 Bytes
fe11eec 350ed4e fe11eec 1c43867 fe11eec 350ed4e 5ce08c1 1c43867 350ed4e 1c43867 ef75cef 5ce08c1 32ea1fa 350ed4e 1c43867 350ed4e 1c43867 0e81072 350ed4e e2f3929 350ed4e 0e81072 58fed7c 350ed4e 1c43867 0e81072 1c43867 350ed4e 0c2e721 5ce08c1 1c43867 350ed4e 5ce08c1 350ed4e 5ce08c1 1c43867 5ce08c1 1c43867 5ce08c1 1c43867 350ed4e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
---
datasets:
- jondurbin/gutenberg-dpo-v0.1
- Qwen/Qwen2.5-14B-Instruct
- HuggingFaceH4/ultrafeedback_binarized
base_model:
- Qwen/Qwen2.5-14B-Instruct
- v000000/Qwen2.5-14B-Gutenberg-1e-Delta
- tanliboy/lambda-qwen2.5-14b-dpo-test
library_name: transformers
tags:
- qwen
- qwen2.5
- finetune
- dpo
- qwen2
- chat
- conversational
- instruct
- storywriting
- roleplay
license: apache-2.0
language:
- en
pipeline_tag: text-generation
---
# Qwen2.5-Lumen-14B
* *Qwen direct preference optimization finetuned for 3 epochs.*

<b>A qwen2.5 preference finetune, targeting prompt adherence, storywriting and roleplay.</b>
-------------------------------------------------------------------------------
## Training Notes
Trained [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) for 2 epochs on [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1), saving different checkpoints along the way.
[Tanliboy](https://huggingface.co/tanliboy) trained [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) for 1 epoch on [HuggingFaceH4/ultrafeedback_binarized](HuggingFaceH4/ultrafeedback_binarized), (Credit to Tanliboy! *Check out his model [here](https://huggingface.co/tanliboy/lambda-qwen2.5-14b-dpo-test)*)
*Mass checkpoint merged, Based on Qwen2.5-14B-Instruct (Base Model).*
## Merge
* Merged with a sophosympatheia's <b>SLERP</b> gradient *"Ultrafeedback-Binarized DPO"* and *"Gutenberg DPO"*
* Merged with a sophosympatheia's <b>SLERP</b> gradient *"Qwen2.5-14B-Instruct"* and *"Gutenberg DPO"*
* Merged all <b>DPO checkpoints</b> and <b>SLERP</b> variations with <b>MODEL_STOCK</b> to analyze geometric properties and get the most performant aspects of all runs/merges.
## Recipe
```yaml
models:
- model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta
- model: v000000/Qwen2.5-14B-Gutenberg-0.6e-Sequential
- model: v000000/Qwen2.5-14B-Gutenberg-0.25e-Early
- model: v000000/Qwen2.5-14B-Gutenberg-2e-Sequential
- model: v000000/Qwen2.5-14B-Gutenberg-0.37e-Early
- model: v000000/Qwen2.5-14B-Gutenberg-2e-Zeta
- model: v000000/Qwen2.5-14B-Gutenberg-1e-Theta
- model: tanliboy/lambda-qwen2.5-14b-dpo-test
- model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta
- model: tanliboy/lambda-qwen2.5-14b-dpo-test
- model: v000000/Qwen2.5-14B-Gutenberg-UltraLambda-Slerpeno
- model: v000000/Qwen2.5-14B-Gutenberg-Instruct-Slerpeno
base_model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta
merge_method: model_stock
dtype: bfloat16
```
### Finetune and merge
This is a merge and finetune of pre-trained language models.
### Models Merged
[Arxiv 2403.19522](https://arxiv.org/abs/2403.19522)
The following models were included in the merge:
* v000000/Qwen2.5-14B-Gutenberg-1e-Delta
* v000000/Qwen2.5-14B-Gutenberg-0.6e-Sequential
* v000000/Qwen2.5-14B-Gutenberg-0.25e-Early
* v000000/Qwen2.5-14B-Gutenberg-2e-Sequential
* v000000/Qwen2.5-14B-Gutenberg-0.37e-Early
* v000000/Qwen2.5-14B-Gutenberg-2e-Zeta
* v000000/Qwen2.5-14B-Gutenberg-1e-Theta
* v000000/Qwen2.5-14B-Gutenberg-UltraLambda-Slerpeno
* v000000/Qwen2.5-14B-Gutenberg-Instruct-Slerpeno
* tanliboy/lambda-qwen2.5-14b-dpo-test
|