Dans-DiscountModels
/

7b-m-dans-personalityengine-v1.2.1-rc-5

@@ -7,7 +7,7 @@ tags:
 datasets:
 - Dans-DiscountModels/pretokenization-test-2
 model-index:
-- name: 7b-m-dans-personalityengine-v1.2.1-rc-4
   results: []
 ---
@@ -29,11 +29,11 @@ trust_remote_code:
 wandb_project: 7b-m-dans-personalityengine
 wandb_watch:
-wandb_run_id: V1.2.1-3-1 # V{Version}-{Run Number}-{Attempt Number}
 wandb_log_model:
 # push checkpoints to hub
-hub_model_id: Dans-DiscountModels/7b-m-dans-personalityengine-v1.2.1-rc-4
 # how to push checkpoints to hub
 # https://huggingface.co/docs/transformers/v4.31.0/en/main_classes/trainer#transformers.TrainingArguments.hub_strategy
 hub_strategy: "every_save"
@@ -87,7 +87,7 @@ micro_batch_size: 2
 num_epochs: 1
 optimizer: ademamix_8bit
-optim_args: "beta1=0.9,beta2=0.999,beta3=0.999,alpha=5"
 lr_scheduler: rex
 learning_rate: 0.00000015
@@ -135,11 +135,11 @@ special_tokens:
 </details><br>
-# 7b-m-dans-personalityengine-v1.2.1-rc-4
 This model is a fine-tuned version of [Dans-DiscountModels/mistral-7b-v0.3-ChatML](https://huggingface.co/Dans-DiscountModels/mistral-7b-v0.3-ChatML) on the Dans-DiscountModels/pretokenization-test-2 dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.4136
 ## Model description
@@ -168,7 +168,7 @@ The following hyperparameters were used during training:
 - total_train_batch_size: 32
 - total_eval_batch_size: 16
 - optimizer: Use ademamix_8bit and the args are:
-beta1=0.9,beta2=0.999,beta3=0.999,alpha=5
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 43
 - num_epochs: 1.0
@@ -178,30 +178,30 @@ beta1=0.9,beta2=0.999,beta3=0.999,alpha=5
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
 | 1.5957        | 0.0007 | 1    | 1.5418          |
-| 1.4896        | 0.0417 | 61   | 1.5008          |
-| 1.5882        | 0.0833 | 122  | 1.4755          |
-| 1.3739        | 0.125  | 183  | 1.4632          |
-| 1.5317        | 0.1667 | 244  | 1.4558          |
-| 1.4852        | 0.2083 | 305  | 1.4504          |
-| 1.3851        | 0.25   | 366  | 1.4460          |
-| 1.514         | 0.2917 | 427  | 1.4423          |
-| 1.5015        | 0.3333 | 488  | 1.4390          |
-| 1.5083        | 0.375  | 549  | 1.4361          |
-| 1.3896        | 0.4167 | 610  | 1.4336          |
-| 1.4243        | 0.4583 | 671  | 1.4313          |
-| 1.3101        | 0.5    | 732  | 1.4291          |
-| 1.5724        | 0.5417 | 793  | 1.4271          |
-| 1.4305        | 0.5833 | 854  | 1.4253          |
-| 1.4534        | 0.625  | 915  | 1.4235          |
-| 1.4756        | 0.6667 | 976  | 1.4219          |
-| 1.4429        | 0.7083 | 1037 | 1.4205          |
-| 1.4753        | 0.75   | 1098 | 1.4191          |
-| 1.473         | 0.7917 | 1159 | 1.4179          |
-| 1.4314        | 0.8333 | 1220 | 1.4167          |
-| 1.3473        | 0.875  | 1281 | 1.4157          |
-| 1.4458        | 0.9167 | 1342 | 1.4148          |
-| 1.4309        | 0.9583 | 1403 | 1.4140          |
-| 1.4304        | 1.0    | 1464 | 1.4136          |
 ### Framework versions

 datasets:
 - Dans-DiscountModels/pretokenization-test-2
 model-index:
+- name: 7b-m-dans-personalityengine-v1.2.1-rc-5
   results: []
 ---
 wandb_project: 7b-m-dans-personalityengine
 wandb_watch:
+wandb_run_id: V1.2.1-4-1 # V{Version}-{Run Number}-{Attempt Number}
 wandb_log_model:
 # push checkpoints to hub
+hub_model_id: Dans-DiscountModels/7b-m-dans-personalityengine-v1.2.1-rc-5
 # how to push checkpoints to hub
 # https://huggingface.co/docs/transformers/v4.31.0/en/main_classes/trainer#transformers.TrainingArguments.hub_strategy
 hub_strategy: "every_save"
 num_epochs: 1
 optimizer: ademamix_8bit
+optim_args: "beta1=0.9,beta2=0.999,beta3=0.999,alpha=10"
 lr_scheduler: rex
 learning_rate: 0.00000015
 </details><br>
+# 7b-m-dans-personalityengine-v1.2.1-rc-5
 This model is a fine-tuned version of [Dans-DiscountModels/mistral-7b-v0.3-ChatML](https://huggingface.co/Dans-DiscountModels/mistral-7b-v0.3-ChatML) on the Dans-DiscountModels/pretokenization-test-2 dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.4047
 ## Model description
 - total_train_batch_size: 32
 - total_eval_batch_size: 16
 - optimizer: Use ademamix_8bit and the args are:
+beta1=0.9,beta2=0.999,beta3=0.999,alpha=10
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 43
 - num_epochs: 1.0
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
 | 1.5957        | 0.0007 | 1    | 1.5418          |
+| 1.487         | 0.0417 | 61   | 1.4982          |
+| 1.5851        | 0.0833 | 122  | 1.4720          |
+| 1.3702        | 0.125  | 183  | 1.4596          |
+| 1.5285        | 0.1667 | 244  | 1.4519          |
+| 1.4809        | 0.2083 | 305  | 1.4461          |
+| 1.3806        | 0.25   | 366  | 1.4414          |
+| 1.5097        | 0.2917 | 427  | 1.4373          |
+| 1.497         | 0.3333 | 488  | 1.4338          |
+| 1.503         | 0.375  | 549  | 1.4306          |
+| 1.384         | 0.4167 | 610  | 1.4278          |
+| 1.4191        | 0.4583 | 671  | 1.4252          |
+| 1.3042        | 0.5    | 732  | 1.4228          |
+| 1.5669        | 0.5417 | 793  | 1.4206          |
+| 1.4239        | 0.5833 | 854  | 1.4185          |
+| 1.4472        | 0.625  | 915  | 1.4165          |
+| 1.4692        | 0.6667 | 976  | 1.4147          |
+| 1.4358        | 0.7083 | 1037 | 1.4130          |
+| 1.4676        | 0.75   | 1098 | 1.4114          |
+| 1.4657        | 0.7917 | 1159 | 1.4099          |
+| 1.424         | 0.8333 | 1220 | 1.4085          |
+| 1.3385        | 0.875  | 1281 | 1.4072          |
+| 1.4373        | 0.9167 | 1342 | 1.4061          |
+| 1.4226        | 0.9583 | 1403 | 1.4052          |
+| 1.4225        | 1.0    | 1464 | 1.4047          |
 ### Framework versions