End of training

Browse files

Files changed (5) hide show

README.md +19 -19
pytorch_model-00001-of-00004.bin +1 -1
pytorch_model-00002-of-00004.bin +1 -1
pytorch_model-00003-of-00004.bin +1 -1
pytorch_model-00004-of-00004.bin +1 -1

README.md CHANGED Viewed

@@ -1,10 +1,11 @@
 ---
-license: other
-base_model: NousResearch/Meta-Llama-3-8B
 tags:
 - generated_from_trainer
 model-index:
-- name: scratch/bf996/axolotl/outputs/numina
   results: []
 ---
@@ -16,7 +17,7 @@ should probably proofread and complete it, then remove this comment. -->
 axolotl version: `0.4.1`
 ```yaml
-base_model: NousResearch/Meta-Llama-3-8B
 model_type: LlamaForCausalLM
 tokenizer_type: AutoTokenizer
@@ -27,22 +28,23 @@ strict: false
 datasets:
   - path: AI-MO/NuminaMath-CoT
     type: sharegpt.load_ultrachat
 dataset_prepared_path: /scratch/bf996/axolotl/datasets/numina
 val_set_size: 0.001
 output_dir: /scratch/bf996/axolotl/outputs/numina
-chat_template: llama3
 sequence_len: 8192
 sample_packing: true
 eval_sample_packing: false
 pad_to_sequence_len: true
-wandb_project:
 wandb_entity:
 wandb_watch:
-wandb_name:
 wandb_log_model:
-shuffle_merged_datasets: true
 gradient_accumulation_steps: 8
 micro_batch_size: 1
@@ -50,7 +52,7 @@ num_epochs: 2
 optimizer: paged_adamw_8bit
 lr_scheduler: cosine
 learning_rate: 2e-5
-max_steps: 50000
 train_on_inputs: false
 group_by_length: false
@@ -68,10 +70,10 @@ xformers_attention:
 flash_attention: true
 warmup_steps: 100
-evals_per_epoch: 2
 eval_table_size:
 save_strategy: steps
-save_steps: 1000
 save_total_limit: 5
 debug:
 deepspeed:
@@ -85,11 +87,12 @@ special_tokens:
 </details><br>
-# scratch/bf996/axolotl/outputs/numina
-This model is a fine-tuned version of [NousResearch/Meta-Llama-3-8B](https://huggingface.co/NousResearch/Meta-Llama-3-8B) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.3942
 ## Model description
@@ -126,11 +129,8 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| 0.7812        | 0.0006 | 1    | 0.8508          |
-| 0.4539        | 0.4799 | 865  | 0.4261          |
-| 0.47          | 0.9597 | 1730 | 0.4013          |
-| 0.3877        | 1.4265 | 2595 | 0.3950          |
-| 0.3924        | 1.9064 | 3460 | 0.3942          |
 ### Framework versions

 ---
+license: llama3
+base_model: meta-llama/Meta-Llama-3-8B
 tags:
+- axolotl
 - generated_from_trainer
 model-index:
+- name: Llama-3-8B-NuminaCoT
   results: []
 ---
 axolotl version: `0.4.1`
 ```yaml
+base_model: meta-llama/Meta-Llama-3-8B
 model_type: LlamaForCausalLM
 tokenizer_type: AutoTokenizer
 datasets:
   - path: AI-MO/NuminaMath-CoT
     type: sharegpt.load_ultrachat
+chat_template: llama3
 dataset_prepared_path: /scratch/bf996/axolotl/datasets/numina
 val_set_size: 0.001
 output_dir: /scratch/bf996/axolotl/outputs/numina
 sequence_len: 8192
 sample_packing: true
 eval_sample_packing: false
 pad_to_sequence_len: true
+wandb_project: lm-evals
 wandb_entity:
 wandb_watch:
+wandb_name: Llama-3-8B-NuminaCoT
 wandb_log_model:
+hub_model_id: penfever/Llama-3-8B-NuminaCoT
 gradient_accumulation_steps: 8
 micro_batch_size: 1
 optimizer: paged_adamw_8bit
 lr_scheduler: cosine
 learning_rate: 2e-5
+max_steps: 10000
 train_on_inputs: false
 group_by_length: false
 flash_attention: true
 warmup_steps: 100
+evals_per_epoch: 0
 eval_table_size:
 save_strategy: steps
+save_steps: 500
 save_total_limit: 5
 debug:
 deepspeed:
 </details><br>
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/nyu-dice-lab/lm-evals/runs/ghe48g78)
+# Llama-3-8B-NuminaCoT
+This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.3943
 ## Model description
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| 0.4379        | 1.0130 | 1826 | 0.3994          |
+| 0.3928        | 1.9064 | 3460 | 0.3943          |
 ### Framework versions

pytorch_model-00001-of-00004.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:232a963808ada172022e90925eb2d2f337ed0d6beb8af998c748224bbdd58df7
 size 4976718466

 version https://git-lfs.github.com/spec/v1
+oid sha256:5e0cc9f6527c9af1389c4c041f4740ff6b6bed5353b7f21e2015cbec77ed78d4
 size 4976718466

pytorch_model-00002-of-00004.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:040c68851f91a2760325112d37d84378d7dc0c74b32ff331b0e00ca1282c5af3
 size 4999827718

 version https://git-lfs.github.com/spec/v1
+oid sha256:91c6851988a44e6d4a31111486ddbe18a14e4f376981b7650c49791ecf838515
 size 4999827718

pytorch_model-00003-of-00004.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9434167d23ee0fda3500d3251b635282c05661c0cd83df68b525b566b5526053
 size 4915940170

 version https://git-lfs.github.com/spec/v1
+oid sha256:0b0669b36ff3a762555ac39b4edd3242fb42ff73bb81c366fed0b7b802a44596
 size 4915940170

pytorch_model-00004-of-00004.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4dd149f531baabf90e5167bc139506dcf5782d1364cbc0c4a7a28ded78f0d6cd
 size 1168140873

 version https://git-lfs.github.com/spec/v1
+oid sha256:5337e1c352bea82087ced3063c1df19dd6dd69fc631d9aaaa4e296cadcfb6e0b
 size 1168140873