trollek commited on
Commit
97b3cee
·
verified ·
1 Parent(s): 29eecd0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -3
README.md CHANGED
@@ -1,3 +1,65 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - jondurbin/airoboros-3.2
5
+ language:
6
+ - en
7
+ library_name: transformers
8
+ base_model: h2oai/h2o-danube2-1.8b-base
9
+ ---
10
+
11
+ # h2o-danube2 with ChatML template
12
+
13
+ This is a [BAdam fine-tuned](https://arxiv.org/abs/2404.02827 "BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models
14
+ ") danube2 base model. It uses the ChatML template and was trained on the [Airoboros-3.2](https://huggingface.co/datasets/jondurbin/airoboros-3.2) dataset from [jondurbin](https://huggingface.co/jondurbin).
15
+
16
+ LLama-Factory was used with the config below:
17
+
18
+ ```yaml
19
+ ### model
20
+ model_name_or_path: /home/trolle/Documents/Projects/trollek/danube2/base/danube2-base-chatml
21
+
22
+ ### method
23
+ stage: sft
24
+ do_train: true
25
+ finetuning_type: full
26
+ use_badam: true
27
+ badam_switch_mode: ascending
28
+ badam_switch_interval: 50
29
+ badam_verbose: 1
30
+ badam_start_block: 13
31
+ badam_mask_mode: scatter
32
+ seed: 314
33
+
34
+ ### dataset
35
+ dataset: airoboros32
36
+ template: ninja_chatml
37
+ cutoff_len: 8192
38
+ overwrite_cache: false
39
+ preprocessing_num_workers: 12
40
+
41
+ ### output
42
+ output_dir: /home/trolle/Documents/Projects/trollek/danube2/base/airoboros32-chatml-badam
43
+ logging_steps: 5
44
+ save_steps: 1
45
+ save_strategy: epoch
46
+ plot_loss: true
47
+ overwrite_output_dir: false
48
+
49
+ ### train
50
+ per_device_train_batch_size: 2
51
+ gradient_accumulation_steps: 8
52
+ learning_rate: 0.00001
53
+ num_train_epochs: 2
54
+ lr_scheduler_type: cosine
55
+ warmup_ratio: 0.01
56
+ pure_bf16: true
57
+ flash_attn: fa2
58
+
59
+ ### eval
60
+ val_size: 0.01
61
+ per_device_eval_batch_size: 1
62
+ eval_strategy: steps
63
+ eval_steps: 1000
64
+ ```
65
+