File size: 17,594 Bytes
bf73ee8 2703f6c bf73ee8 2703f6c af3d78b 2703f6c c3ce158 2703f6c 69a1be9 2703f6c 69a1be9 2703f6c ed5cf00 2703f6c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 |
---
license: mit
pipeline_tag: text-generation
library_name: transformers
language: [
'en', 'am', 'ar', 'as', 'az', 'be', 'bg', 'bn', 'br', 'bs', 'ca', 'cs', 'cy', 'da', 'de', 'el',
'eo', 'es', 'et', 'eu', 'fa', 'ff', 'fi', 'fr', 'fy', 'ga', 'gd', 'gl', 'gn', 'gu', 'ha', 'he',
'hi', 'hr', 'ht', 'hu', 'hy', 'id', 'ig', 'is', 'it', 'ja', 'jv', 'ka', 'kk', 'km', 'kn', 'ko',
'ku', 'ky', 'la', 'lg', 'li', 'ln', 'lo', 'lt', 'lv', 'mg', 'mk', 'ml', 'mn', 'mr', 'ms', 'my',
'ne', 'nl', 'no', 'ns', 'om', 'or', 'pa', 'pl', 'ps', 'pt', 'qu', 'rm', 'ro', 'ru', 'sa', 'si',
'sc', 'sd', 'sk', 'sl', 'so', 'sq', 'sr', 'ss', 'su', 'sv', 'sw', 'ta', 'te', 'th', 'tl', 'tn',
'tr', 'ug', 'uk', 'ur', 'uz', 'vi', 'wo', 'xh', 'yi', 'yo', 'zu',
]
datasets:
# core - base
- ontocord/fineweb-permissive-multilingual-2m
- distily/c4_multilingual_1M
- data-silence/sumnews
- xu-song/cc100-samples
- badrex/llm-emoji-dataset
- fblgit/simple-math
- Gusarich/math-expressions-1m
- neuralwork/arxiver
- christopher/rosetta-code
- nampdn-ai/tiny-codes
- JeanKaddour/minipile
# core - instruct
- NousResearch/hermes-function-calling-v1
- simplescaling/s1K-1.1
# base - instruct
- mlabonne/open-perfectblend
- allenai/tulu-3-sft-mixture
- rombodawg/Everything_Instruct_Multilingual
# base - reason
- open-r1/OpenR1-Math-220k
- open-thoughts/OpenThoughts-114k
- cognitivecomputations/dolphin-r1
- simplescaling/s1K-1.1
tags:
- chat
- core
- base
- instruct
- reason
---
# tangled-alpha-0.5-core

```bash
time python -B prepare_core_datasets.py
```
```
i=0, min_len=0, max_len=1073741824, block_size=4097, chunk_size=16388000, len(dataset)=1287403, len(dataset) * block_size=5274490091
Total number of tokens in the optimized dataset '../core-data-0-0-1073741824-4097-4000' is 5274490091
```
```bash
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain_core_model.yaml
```
```
Seed set to 23
Time to instantiate model: 0.31 seconds.
Total parameters: 201,359,872
Verifying settings ...
Measured TFLOPs: 7072.06
Epoch 1 | iter 256 step 1 | loss train: 11.961, val: n/a | iter time: 406.23 ms (step) remaining time: 3 days, 13:55:33
Epoch 1 | iter 512 step 2 | loss train: 11.953, val: n/a | iter time: 358.84 ms (step) remaining time: 3 days, 0:49:32
Epoch 1 | iter 768 step 3 | loss train: 11.943, val: n/a | iter time: 357.16 ms (step) remaining time: 2 days, 20:38:36
Epoch 1 | iter 1024 step 4 | loss train: 11.907, val: n/a | iter time: 355.69 ms (step) remaining time: 2 days, 18:31:54
Epoch 1 | iter 1280 step 5 | loss train: 11.854, val: n/a | iter time: 358.32 ms (step) remaining time: 2 days, 17:13:13
Epoch 1 | iter 1536 step 6 | loss train: 11.789, val: n/a | iter time: 355.59 ms (step) remaining time: 2 days, 16:18:25
Epoch 1 | iter 1792 step 7 | loss train: 11.703, val: n/a | iter time: 354.88 ms (step) remaining time: 2 days, 15:37:56
Epoch 1 | iter 2048 step 8 | loss train: 11.586, val: n/a | iter time: 354.07 ms (step) remaining time: 2 days, 15:06:45
Epoch 1 | iter 2304 step 9 | loss train: 11.451, val: n/a | iter time: 352.89 ms (step) remaining time: 2 days, 14:41:54
Epoch 1 | iter 2560 step 10 | loss train: 11.347, val: n/a | iter time: 355.58 ms (step) remaining time: 2 days, 14:21:38
Epoch 1 | iter 2816 step 11 | loss train: 11.271, val: n/a | iter time: 351.01 ms (step) remaining time: 2 days, 14:04:43
Epoch 1 | iter 3072 step 12 | loss train: 11.194, val: n/a | iter time: 351.91 ms (step) remaining time: 2 days, 13:50:26
Epoch 1 | iter 3328 step 13 | loss train: 11.151, val: n/a | iter time: 353.02 ms (step) remaining time: 2 days, 13:38:04
Epoch 1 | iter 3584 step 14 | loss train: 11.097, val: n/a | iter time: 353.75 ms (step) remaining time: 2 days, 13:27:21
Epoch 1 | iter 3840 step 15 | loss train: 11.064, val: n/a | iter time: 358.31 ms (step) remaining time: 2 days, 13:17:48
Epoch 1 | iter 4096 step 16 | loss train: 11.008, val: n/a | iter time: 351.95 ms (step) remaining time: 2 days, 13:09:17
Epoch 1 | iter 4352 step 17 | loss train: 10.997, val: n/a | iter time: 352.26 ms (step) remaining time: 2 days, 13:01:35
Epoch 1 | iter 4608 step 18 | loss train: 10.951, val: n/a | iter time: 352.57 ms (step) remaining time: 2 days, 12:54:35
Epoch 1 | iter 4864 step 19 | loss train: 10.902, val: n/a | iter time: 354.73 ms (step) remaining time: 2 days, 12:48:13
Epoch 1 | iter 5120 step 20 | loss train: 10.877, val: n/a | iter time: 354.47 ms (step) remaining time: 2 days, 12:43:19
Epoch 1 | iter 5376 step 21 | loss train: 10.830, val: n/a | iter time: 353.78 ms (step) remaining time: 2 days, 12:37:49
Epoch 1 | iter 5632 step 22 | loss train: 10.809, val: n/a | iter time: 355.03 ms (step) remaining time: 2 days, 12:32:44
Epoch 1 | iter 5888 step 23 | loss train: 10.727, val: n/a | iter time: 351.49 ms (step) remaining time: 2 days, 12:27:56
Epoch 1 | iter 6144 step 24 | loss train: 10.707, val: n/a | iter time: 351.58 ms (step) remaining time: 2 days, 12:23:24
Epoch 1 | iter 6400 step 25 | loss train: 10.643, val: n/a | iter time: 350.84 ms (step) remaining time: 2 days, 12:19:10
Epoch 1 | iter 6656 step 26 | loss train: 10.649, val: n/a | iter time: 355.14 ms (step) remaining time: 2 days, 12:15:07
Epoch 1 | iter 6912 step 27 | loss train: 10.580, val: n/a | iter time: 352.60 ms (step) remaining time: 2 days, 12:11:12
Epoch 1 | iter 7168 step 28 | loss train: 10.554, val: n/a | iter time: 351.57 ms (step) remaining time: 2 days, 12:07:27
Epoch 1 | iter 7424 step 29 | loss train: 10.526, val: n/a | iter time: 350.36 ms (step) remaining time: 2 days, 12:03:55
Epoch 1 | iter 7680 step 30 | loss train: 10.496, val: n/a | iter time: 353.19 ms (step) remaining time: 2 days, 12:00:34
Epoch 1 | iter 7936 step 31 | loss train: 10.496, val: n/a | iter time: 350.95 ms (step) remaining time: 2 days, 11:57:21
Epoch 1 | iter 8192 step 32 | loss train: 10.421, val: n/a | iter time: 352.71 ms (step) remaining time: 2 days, 11:54:18
Epoch 1 | iter 8448 step 33 | loss train: 10.379, val: n/a | iter time: 354.15 ms (step) remaining time: 2 days, 11:51:21
Epoch 1 | iter 8704 step 34 | loss train: 10.343, val: n/a | iter time: 353.95 ms (step) remaining time: 2 days, 11:48:29
Epoch 1 | iter 8960 step 35 | loss train: 10.353, val: n/a | iter time: 351.04 ms (step) remaining time: 2 days, 11:45:44
Epoch 1 | iter 9216 step 36 | loss train: 10.323, val: n/a | iter time: 354.76 ms (step) remaining time: 2 days, 11:43:05
Epoch 1 | iter 9472 step 37 | loss train: 10.258, val: n/a | iter time: 353.18 ms (step) remaining time: 2 days, 11:40:29
Epoch 1 | iter 9728 step 38 | loss train: 10.260, val: n/a | iter time: 353.86 ms (step) remaining time: 2 days, 11:37:57
Epoch 1 | iter 9984 step 39 | loss train: 10.257, val: n/a | iter time: 356.14 ms (step) remaining time: 2 days, 11:35:50
Epoch 1 | iter 10240 step 40 | loss train: 10.179, val: n/a | iter time: 353.73 ms (step) remaining time: 2 days, 11:33:23
Epoch 1 | iter 10496 step 41 | loss train: 10.163, val: n/a | iter time: 350.49 ms (step) remaining time: 2 days, 11:30:59
Epoch 1 | iter 10752 step 42 | loss train: 10.156, val: n/a | iter time: 354.15 ms (step) remaining time: 2 days, 11:28:40
Epoch 1 | iter 11008 step 43 | loss train: 10.150, val: n/a | iter time: 350.99 ms (step) remaining time: 2 days, 11:26:24
Epoch 1 | iter 11264 step 44 | loss train: 10.089, val: n/a | iter time: 354.28 ms (step) remaining time: 2 days, 11:24:09
Epoch 1 | iter 11520 step 45 | loss train: 10.096, val: n/a | iter time: 352.46 ms (step) remaining time: 2 days, 11:21:56
Epoch 1 | iter 11776 step 46 | loss train: 10.021, val: n/a | iter time: 356.80 ms (step) remaining time: 2 days, 11:19:45
Epoch 1 | iter 12032 step 47 | loss train: 10.002, val: n/a | iter time: 355.30 ms (step) remaining time: 2 days, 11:17:36
Epoch 1 | iter 12288 step 48 | loss train: 10.021, val: n/a | iter time: 355.12 ms (step) remaining time: 2 days, 11:15:32
Epoch 1 | iter 12544 step 49 | loss train: 10.017, val: n/a | iter time: 353.81 ms (step) remaining time: 2 days, 11:13:29
Epoch 1 | iter 12800 step 50 | loss train: 9.966, val: n/a | iter time: 354.70 ms (step) remaining time: 2 days, 11:11:26
# ...
Epoch 1 | iter 640256 step 2501 | loss train: 3.419, val: 3.366 | iter time: 351.28 ms (step) remaining time: 0:20:06
Epoch 1 | iter 640512 step 2502 | loss train: 3.425, val: 3.366 | iter time: 351.02 ms (step) remaining time: 0:18:40
Epoch 1 | iter 640768 step 2503 | loss train: 3.396, val: 3.366 | iter time: 351.61 ms (step) remaining time: 0:17:14
Epoch 1 | iter 641024 step 2504 | loss train: 3.466, val: 3.366 | iter time: 351.42 ms (step) remaining time: 0:15:48
Epoch 1 | iter 641280 step 2505 | loss train: 3.426, val: 3.366 | iter time: 351.72 ms (step) remaining time: 0:14:23
Epoch 1 | iter 641536 step 2506 | loss train: 3.410, val: 3.366 | iter time: 351.04 ms (step) remaining time: 0:12:57
Epoch 1 | iter 641792 step 2507 | loss train: 3.523, val: 3.366 | iter time: 352.67 ms (step) remaining time: 0:11:31
Epoch 1 | iter 642048 step 2508 | loss train: 3.518, val: 3.366 | iter time: 352.04 ms (step) remaining time: 0:10:06
Epoch 1 | iter 642304 step 2509 | loss train: 3.533, val: 3.366 | iter time: 350.88 ms (step) remaining time: 0:08:40
Epoch 1 | iter 642560 step 2510 | loss train: 3.541, val: 3.366 | iter time: 351.22 ms (step) remaining time: 0:07:14
Epoch 1 | iter 642816 step 2511 | loss train: 3.564, val: 3.366 | iter time: 352.00 ms (step) remaining time: 0:05:48
Epoch 1 | iter 643072 step 2512 | loss train: 3.462, val: 3.366 | iter time: 351.88 ms (step) remaining time: 0:04:23
Epoch 1 | iter 643328 step 2513 | loss train: 3.530, val: 3.366 | iter time: 351.49 ms (step) remaining time: 0:02:57
Epoch 1 | iter 643584 step 2514 | loss train: 3.484, val: 3.366 | iter time: 351.11 ms (step) remaining time: 0:01:31
Epoch 2 | iter 643840 step 2515 | loss train: 3.375, val: 3.366 | iter time: 352.07 ms (step) remaining time: 0:00:06
Validating ...
Final evaluation | val loss: 3.366 | val ppl: 28.963
Saving checkpoint to '../out/pretrain-core/final/lit_model.pth'
----------------------------------------
| Performance
| - Total tokens : 5,274,484,736
| - Training Time : 215640.67 s
| - Tok/sec : 16453.94 tok/s
| ----------------------------------------
| Memory Usage
| - Memory Used : 20.44 GB
----------------------------------------
```
Backup `wandb`:
```bash
mv wandb wandb-pretrain-core
```
Chat with model:
```bash
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt chat ../out/pretrain-core/final
```
```bash
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True time litgpt evaluate --tasks 'leaderboard' --out_dir '../evaluate/pretrain-core-0/leaderboard/' --batch_size 1 --dtype 'bfloat16' '../out/pretrain-core/final'
```
```
Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|-----------------------------------------------------------|-------|------|-----:|-----------------------|---|-----:|---|------|
|leaderboard | N/A| | | | | | | |
| - leaderboard_bbh | N/A| | | | | | | |
| - leaderboard_bbh_boolean_expressions | 1|none | 3|acc_norm |↑ |0.5640|± |0.0314|
| - leaderboard_bbh_causal_judgement | 1|none | 3|acc_norm |↑ |0.5187|± |0.0366|
| - leaderboard_bbh_date_understanding | 1|none | 3|acc_norm |↑ |0.2000|± |0.0253|
| - leaderboard_bbh_disambiguation_qa | 1|none | 3|acc_norm |↑ |0.2960|± |0.0289|
| - leaderboard_bbh_formal_fallacies | 1|none | 3|acc_norm |↑ |0.4680|± |0.0316|
| - leaderboard_bbh_geometric_shapes | 1|none | 3|acc_norm |↑ |0.0880|± |0.0180|
| - leaderboard_bbh_hyperbaton | 1|none | 3|acc_norm |↑ |0.5160|± |0.0317|
| - leaderboard_bbh_logical_deduction_five_objects | 1|none | 3|acc_norm |↑ |0.1920|± |0.0250|
| - leaderboard_bbh_logical_deduction_seven_objects | 1|none | 3|acc_norm |↑ |0.1320|± |0.0215|
| - leaderboard_bbh_logical_deduction_three_objects | 1|none | 3|acc_norm |↑ |0.3360|± |0.0299|
| - leaderboard_bbh_movie_recommendation | 1|none | 3|acc_norm |↑ |0.2520|± |0.0275|
| - leaderboard_bbh_navigate | 1|none | 3|acc_norm |↑ |0.5520|± |0.0315|
| - leaderboard_bbh_object_counting | 1|none | 3|acc_norm |↑ |0.0760|± |0.0168|
| - leaderboard_bbh_penguins_in_a_table | 1|none | 3|acc_norm |↑ |0.1918|± |0.0327|
| - leaderboard_bbh_reasoning_about_colored_objects | 1|none | 3|acc_norm |↑ |0.0680|± |0.0160|
| - leaderboard_bbh_ruin_names | 1|none | 3|acc_norm |↑ |0.2080|± |0.0257|
| - leaderboard_bbh_salient_translation_error_detection | 1|none | 3|acc_norm |↑ |0.1880|± |0.0248|
| - leaderboard_bbh_snarks | 1|none | 3|acc_norm |↑ |0.4607|± |0.0375|
| - leaderboard_bbh_sports_understanding | 1|none | 3|acc_norm |↑ |0.4600|± |0.0316|
| - leaderboard_bbh_temporal_sequences | 1|none | 3|acc_norm |↑ |0.2720|± |0.0282|
| - leaderboard_bbh_tracking_shuffled_objects_five_objects | 1|none | 3|acc_norm |↑ |0.2080|± |0.0257|
| - leaderboard_bbh_tracking_shuffled_objects_seven_objects| 1|none | 3|acc_norm |↑ |0.1520|± |0.0228|
| - leaderboard_bbh_tracking_shuffled_objects_three_objects| 1|none | 3|acc_norm |↑ |0.3320|± |0.0298|
| - leaderboard_bbh_web_of_lies | 1|none | 3|acc_norm |↑ |0.4880|± |0.0317|
| - leaderboard_gpqa | N/A| | | | | | | |
| - leaderboard_gpqa_diamond | 1|none | 0|acc_norm |↑ |0.2020|± |0.0286|
| - leaderboard_gpqa_extended | 1|none | 0|acc_norm |↑ |0.2656|± |0.0189|
| - leaderboard_gpqa_main | 1|none | 0|acc_norm |↑ |0.2567|± |0.0207|
| - leaderboard_ifeval | 3|none | 0|inst_level_loose_acc |↑ |0.2350|± | N/A|
| | |none | 0|inst_level_strict_acc |↑ |0.2242|± | N/A|
| | |none | 0|prompt_level_loose_acc |↑ |0.1109|± |0.0135|
| | |none | 0|prompt_level_strict_acc|↑ |0.1054|± |0.0132|
| - leaderboard_math_hard | N/A| | | | | | | |
| - leaderboard_math_algebra_hard | 2|none | 4|exact_match |↑ |0.0000|± | 0|
| - leaderboard_math_counting_and_prob_hard | 2|none | 4|exact_match |↑ |0.0000|± | 0|
| - leaderboard_math_geometry_hard | 2|none | 4|exact_match |↑ |0.0000|± | 0|
| - leaderboard_math_intermediate_algebra_hard | 2|none | 4|exact_match |↑ |0.0000|± | 0|
| - leaderboard_math_num_theory_hard | 2|none | 4|exact_match |↑ |0.0019|± |0.0019|
| - leaderboard_math_prealgebra_hard | 2|none | 4|exact_match |↑ |0.0011|± |0.0011|
| - leaderboard_math_precalculus_hard | 2|none | 4|exact_match |↑ |0.0000|± | 0|
| - leaderboard_mmlu_pro | 0.1|none | 5|acc |↑ |0.1177|± |0.0029|
| - leaderboard_musr | N/A| | | | | | | |
| - leaderboard_musr_murder_mysteries | 1|none | 0|acc_norm |↑ |0.4880|± |0.0317|
| - leaderboard_musr_object_placements | 1|none | 0|acc_norm |↑ |0.2266|± |0.0262|
| - leaderboard_musr_team_allocation | 1|none | 0|acc_norm |↑ |0.2560|± |0.0277|
```
|