trollek
/

danube2-1.8b-airoboros-3.2

Text Generation

text-generation-inference

Model card Files Files and versions Community

trollek commited on Jun 27, 2024

Commit

97b3cee

·

verified ·

1 Parent(s): 29eecd0

Update README.md

Files changed (1) hide show

README.md +65 -3

README.md CHANGED Viewed

@@ -1,3 +1,65 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+datasets:
+- jondurbin/airoboros-3.2
+language:
+- en
+library_name: transformers
+base_model: h2oai/h2o-danube2-1.8b-base
+---
+# h2o-danube2 with ChatML template
+This is a [BAdam fine-tuned](https://arxiv.org/abs/2404.02827 "BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models
+") danube2 base model. It uses the ChatML template and was trained on the [Airoboros-3.2](https://huggingface.co/datasets/jondurbin/airoboros-3.2) dataset from [jondurbin](https://huggingface.co/jondurbin).
+LLama-Factory was used with the config below:
+```yaml
+### model
+model_name_or_path: /home/trolle/Documents/Projects/trollek/danube2/base/danube2-base-chatml
+### method
+stage: sft
+do_train: true
+finetuning_type: full
+use_badam: true
+badam_switch_mode: ascending
+badam_switch_interval: 50
+badam_verbose: 1
+badam_start_block: 13
+badam_mask_mode: scatter
+seed: 314
+### dataset
+dataset: airoboros32
+template: ninja_chatml
+cutoff_len: 8192
+overwrite_cache: false
+preprocessing_num_workers: 12
+### output
+output_dir: /home/trolle/Documents/Projects/trollek/danube2/base/airoboros32-chatml-badam
+logging_steps: 5
+save_steps: 1
+save_strategy: epoch
+plot_loss: true
+overwrite_output_dir: false
+### train
+per_device_train_batch_size: 2
+gradient_accumulation_steps: 8
+learning_rate: 0.00001
+num_train_epochs: 2
+lr_scheduler_type: cosine
+warmup_ratio: 0.01
+pure_bf16: true
+flash_attn: fa2
+### eval
+val_size: 0.01
+per_device_eval_batch_size: 1
+eval_strategy: steps
+eval_steps: 1000
+```