--- license: mit datasets: - Replete-AI/code_bagel language: - en tags: - code --- ### Base_model microsoft/Phi-3-medium-128k-instruct
(https://huggingface.co/microsoft/Phi-3-medium-128k-instruct) ### Datasets Replete-AI/code_bagel
(https://huggingface.co/datasets/Replete-AI/code_bagel) ### Train Loss ![image/png](https://cdn-uploads.huggingface.co/production/uploads/636f54b95d2050767e4a6317/tOBahj5rDAJzqCmftVdkX.png) ### Train State Trainable params: 27852800 || all params: 13988090880 || trainable%: 0.1991
Total Training Duration:69h18m17s ```json { "epoch": 0.9999679800589659, "total_flos": 1.446273483573748e+20, "train_loss": 0.44412665014957775, "train_runtime": 249497.725, "train_samples_per_second": 13.018, "train_steps_per_second": 0.102 } ``` ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 1 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - total_eval_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 1200 - num_epochs: 1.0