Taishi-N324 commited on
Commit
7858b8c
·
verified ·
1 Parent(s): 3cdc850

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -42
README.md CHANGED
@@ -9,9 +9,9 @@ license:
9
  - llama3.3
10
  ---
11
 
12
- # Gemma-2-Llama Swallow
13
 
14
- Gemma-2-Llama Swallow series was built by continual pre-training on the [gemma-2](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315) models.
15
  Gemma 2 Swallow enhanced the Japanese language capabilities of the original Gemma 2 while retaining the English language capabilities.
16
  We use approximately 200 billion tokens that were sampled from a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia articles, and mathematical and coding contents, etc (see the Training Datasets section of the base model) for continual pre-training.
17
  The instruction-tuned models (it) were built by supervised fine-tuning (SFT) on the synthetic data specially built for Japanese.
@@ -50,49 +50,49 @@ The website [https://swallow-llm.github.io/](https://swallow-llm.github.io/index
50
 
51
  ### Japanese tasks
52
 
53
- | Model | JCom. | JEMHopQA | NIILC | JSQuAD | XL-Sum | MGSM | WMT20-en-ja | WMT20-ja-en | JMMLU | JHumanEval | Ja Avg |
54
- | ----------------------------------------------- | ------ | -------- | ------- | ------- | ------- | ------ | ----------- | ----------- | ------ | ---------- | ------ |
55
- | | 4-shot | 4-shot | 4-shot | 4-shot | 1-shot | 4-shot | 4-shot | 4-shot | 5-shot | 0-shot | |
56
- | | EM acc | Char-F1 | Char-F1 | Char-F1 | ROUGE-2 | EM acc | BLEU | BLEU | EM acc | pass@1 | |
57
- | google/gemma-3-1b-pt | 0.237 | 0.410 | 0.252 | 0.631 | 0.079 | 0.024 | 0.150 | 0.136 | 0.239 | 0.073 | 0.223 |
58
- | Qwen/Qwen2.5-1.5B | 0.800 | 0.383 | 0.241 | 0.849 | 0.143 | 0.292 | 0.132 | 0.134 | 0.438 | 0.308 | 0.372 |
59
- | google/gemma-2-2b | 0.721 | 0.472 | 0.316 | 0.810 | 0.083 | 0.124 | 0.203 | 0.190 | 0.388 | 0.177 | 0.348 |
60
- | rinna/gemma-2-baku-2b | 0.760 | 0.475 | 0.443 | 0.843 | 0.121 | 0.124 | 0.255 | 0.187 | 0.376 | 0.137 | 0.372 |
61
- | tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1 | 0.830 | 0.509 | 0.549 | 0.863 | 0.119 | 0.172 | 0.261 | 0.195 | 0.461 | 0.251 | 0.421 |
62
- | Qwen/Qwen2.5-3B | 0.847 | 0.475 | 0.306 | 0.878 | 0.176 | 0.460 | 0.180 | 0.167 | 0.529 | 0.404 | 0.442 |
63
- | google/gemma-3-4b-pt | 0.851 | 0.432 | 0.410 | 0.887 | 0.139 | 0.248 | 0.230 | 0.205 | 0.499 | 0.273 | 0.417 |
64
- | Qwen/Qwen2.5-7B | 0.924 | 0.459 | 0.426 | 0.907 | 0.216 | 0.616 | 0.229 | 0.199 | 0.634 | 0.507 | 0.512 |
65
- | tokyotech-llm/Llama-3.1-Swallow-8B-v0.2 | 0.911 | 0.510 | 0.627 | 0.892 | 0.198 | 0.464 | 0.296 | 0.233 | 0.525 | 0.336 | 0.499 |
66
- | google/gemma-2-9b | 0.904 | 0.573 | 0.524 | 0.898 | 0.168 | 0.456 | 0.269 | 0.236 | 0.623 | 0.345 | 0.500 |
67
- | tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1 | 0.950 | 0.643 | 0.677 | 0.897 | 0.187 | 0.560 | 0.304 | 0.247 | 0.650 | 0.462 | 0.558 |
68
- | google/gemma-3-12b-pt | 0.787 | 0.563 | 0.569 | 0.911 | 0.194 | 0.584 | 0.288 | 0.244 | 0.659 | 0.385 | 0.518 |
69
- | google/gemma-2-27b | 0.936 | 0.553 | 0.573 | 0.916 | 0.194 | 0.596 | 0.295 | 0.251 | 0.659 | 0.490 | 0.546 |
70
- | tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1 | 0.958 | 0.660 | 0.671 | 0.924 | 0.200 | 0.644 | 0.321 | 0.255 | 0.679 | 0.629 | 0.594 |
71
- | google/gemma-3-27b-pt | 0.944 | 0.582 | 0.627 | 0.915 | 0.210 | 0.704 | 0.301 | 0.255 | 0.724 | 0.473 | 0.574 |
72
- | Qwen/Qwen2.5-32B | 0.961 | 0.561 | 0.538 | 0.925 | 0.228 | 0.808 | 0.271 | 0.233 | 0.751 | 0.637 | 0.591 |
73
 
74
  ### English tasks
75
 
76
- | Model | OpenBookQA | TriviaQA | HellaSWAG | SQuAD2.0 | XWINO | MMLU | GSM8K | MATH | BBH | HumanEval | En Avg |
77
- | ----------------------------------------------- | ---------- | -------- | --------- | -------- | ------ | ------ | ------ | ---------- | ---------- | --------- | ------ |
78
- | | 4-shot | 4-shot | 4-shot | 4-shot | 4-shot | 5-shot | 4-shot | 4-shot | 3-shot | 0-shot | |
79
- | | Acc | EM acc | Acc | EM acc | Acc | Acc | EM acc | CoT EM Acc | CoT EM Acc | pass@1 | |
80
- | google/gemma-3-1b-pt | 0.304 | 0.358 | 0.471 | 0.501 | 0.832 | 0.262 | 0.016 | 0.008 | 0.276 | 0.070 | 0.310 |
81
- | Qwen/Qwen2.5-1.5B | 0.342 | 0.397 | 0.499 | 0.506 | 0.851 | 0.610 | 0.611 | 0.314 | 0.413 | 0.356 | 0.490 |
82
- | google/gemma-2-2b | 0.342 | 0.552 | 0.552 | 0.501 | 0.890 | 0.530 | 0.249 | 0.176 | 0.415 | 0.188 | 0.439 |
83
- | rinna/gemma-2-baku-2b | 0.314 | 0.475 | 0.533 | 0.501 | 0.881 | 0.493 | 0.168 | 0.110 | 0.376 | 0.150 | 0.400 |
84
- | tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1 | 0.312 | 0.435 | 0.516 | 0.501 | 0.871 | 0.538 | 0.275 | 0.144 | 0.384 | 0.286 | 0.426 |
85
- | Qwen/Qwen2.5-3B | 0.360 | 0.504 | 0.553 | 0.541 | 0.872 | 0.657 | 0.580 | 0.440 | 0.442 | 0.387 | 0.534 |
86
- | google/gemma-3-4b-pt | 0.360 | 0.603 | 0.576 | 0.502 | 0.895 | 0.596 | 0.376 | 0.258 | 0.495 | 0.351 | 0.501 |
87
- | Qwen/Qwen2.5-7B | 0.392 | 0.601 | 0.600 | 0.618 | 0.888 | 0.742 | 0.832 | 0.510 | 0.562 | 0.554 | 0.630 |
88
- | tokyotech-llm/Llama-3.1-Swallow-8B-v0.2 | 0.382 | 0.651 | 0.596 | 0.513 | 0.904 | 0.622 | 0.521 | 0.228 | 0.605 | 0.366 | 0.539 |
89
- | google/gemma-2-9b | 0.382 | 0.718 | 0.626 | 0.506 | 0.907 | 0.706 | 0.688 | 0.338 | 0.704 | 0.390 | 0.597 |
90
- | tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1 | 0.362 | 0.659 | 0.602 | 0.532 | 0.906 | 0.687 | 0.678 | 0.330 | 0.664 | 0.529 | 0.595 |
91
- | google/gemma-3-12b-pt | 0.398 | 0.747 | 0.637 | 0.524 | 0.917 | 0.737 | 0.703 | 0.398 | 0.683 | 0.445 | 0.619 |
92
- | google/gemma-2-27b | 0.412 | 0.780 | 0.675 | 0.549 | 0.921 | 0.754 | 0.757 | 0.438 | 0.760 | 0.508 | 0.655 |
93
- | tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1 | 0.414 | 0.756 | 0.652 | 0.597 | 0.915 | 0.749 | 0.732 | 0.416 | 0.765 | 0.658 | 0.665 |
94
- | google/gemma-3-27b-pt | 0.414 | 0.809 | 0.667 | 0.618 | 0.923 | 0.780 | 0.801 | 0.520 | 0.732 | 0.507 | 0.677 |
95
- | Qwen/Qwen2.5-32B | 0.406 | 0.664 | 0.656 | 0.668 | 0.913 | 0.832 | 0.718 | 0.600 | 0.717 | 0.523 | 0.670 |
96
 
97
  ## Evaluation Benchmarks
98
 
 
9
  - llama3.3
10
  ---
11
 
12
+ # Gemma-2-Llama-Swallow
13
 
14
+ Gemma-2-Llama-Swallow series was built by continual pre-training on the [gemma-2](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315) models.
15
  Gemma 2 Swallow enhanced the Japanese language capabilities of the original Gemma 2 while retaining the English language capabilities.
16
  We use approximately 200 billion tokens that were sampled from a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia articles, and mathematical and coding contents, etc (see the Training Datasets section of the base model) for continual pre-training.
17
  The instruction-tuned models (it) were built by supervised fine-tuning (SFT) on the synthetic data specially built for Japanese.
 
50
 
51
  ### Japanese tasks
52
 
53
+ | Model | JCom. | JEMHopQA | NIILC | JSQuAD | XL-Sum | MGSM | WMT20-en-ja | WMT20-ja-en | JMMLU | JHumanEval | Ja Avg |
54
+ | --------------------------------------------------- | ------ | -------- | ------- | ------- | ------- | ------ | ----------- | ----------- | ------ | ---------- | ------ |
55
+ | | 4-shot | 4-shot | 4-shot | 4-shot | 1-shot | 4-shot | 4-shot | 4-shot | 5-shot | 0-shot | |
56
+ | | EM acc | Char-F1 | Char-F1 | Char-F1 | ROUGE-2 | EM acc | BLEU | BLEU | EM acc | pass@1 | |
57
+ | google/gemma-3-1b-pt | 0.237 | 0.410 | 0.252 | 0.631 | 0.079 | 0.024 | 0.150 | 0.136 | 0.239 | 0.073 | 0.223 |
58
+ | Qwen/Qwen2.5-1.5B | 0.800 | 0.383 | 0.241 | 0.849 | 0.143 | 0.292 | 0.132 | 0.134 | 0.438 | 0.308 | 0.372 |
59
+ | google/gemma-2-2b | 0.721 | 0.472 | 0.316 | 0.810 | 0.083 | 0.124 | 0.203 | 0.190 | 0.388 | 0.177 | 0.348 |
60
+ | rinna/gemma-2-baku-2b | 0.760 | 0.475 | 0.443 | 0.843 | 0.121 | 0.124 | 0.255 | 0.187 | 0.376 | 0.137 | 0.372 |
61
+ | **tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1** | 0.830 | 0.509 | 0.549 | 0.863 | 0.119 | 0.172 | 0.261 | 0.195 | 0.461 | 0.251 | 0.421 |
62
+ | Qwen/Qwen2.5-3B | 0.847 | 0.475 | 0.306 | 0.878 | 0.176 | 0.460 | 0.180 | 0.167 | 0.529 | 0.404 | 0.442 |
63
+ | google/gemma-3-4b-pt | 0.851 | 0.432 | 0.410 | 0.887 | 0.139 | 0.248 | 0.230 | 0.205 | 0.499 | 0.273 | 0.417 |
64
+ | Qwen/Qwen2.5-7B | 0.924 | 0.459 | 0.426 | 0.907 | 0.216 | 0.616 | 0.229 | 0.199 | 0.634 | 0.507 | 0.512 |
65
+ | tokyotech-llm/Llama-3.1-Swallow-8B-v0.2 | 0.911 | 0.510 | 0.627 | 0.892 | 0.198 | 0.464 | 0.296 | 0.233 | 0.525 | 0.336 | 0.499 |
66
+ | google/gemma-2-9b | 0.904 | 0.573 | 0.524 | 0.898 | 0.168 | 0.456 | 0.269 | 0.236 | 0.623 | 0.345 | 0.500 |
67
+ | **tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1** | 0.950 | 0.643 | 0.677 | 0.897 | 0.187 | 0.560 | 0.304 | 0.247 | 0.650 | 0.462 | 0.558 |
68
+ | google/gemma-3-12b-pt | 0.787 | 0.563 | 0.569 | 0.911 | 0.194 | 0.584 | 0.288 | 0.244 | 0.659 | 0.385 | 0.518 |
69
+ | google/gemma-2-27b | 0.936 | 0.553 | 0.573 | 0.916 | 0.194 | 0.596 | 0.295 | 0.251 | 0.659 | 0.490 | 0.546 |
70
+ | **tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1** | 0.958 | 0.660 | 0.671 | 0.924 | 0.200 | 0.644 | 0.321 | 0.255 | 0.679 | 0.629 | 0.594 |
71
+ | google/gemma-3-27b-pt | 0.944 | 0.582 | 0.627 | 0.915 | 0.210 | 0.704 | 0.301 | 0.255 | 0.724 | 0.473 | 0.574 |
72
+ | Qwen/Qwen2.5-32B | 0.961 | 0.561 | 0.538 | 0.925 | 0.228 | 0.808 | 0.271 | 0.233 | 0.751 | 0.637 | 0.591 |
73
 
74
  ### English tasks
75
 
76
+ | Model | OpenBookQA | TriviaQA | HellaSWAG | SQuAD2.0 | XWINO | MMLU | GSM8K | MATH | BBH | HumanEval | En Avg |
77
+ | --------------------------------------------------- | ---------- | -------- | --------- | -------- | ------ | ------ | ------ | ---------- | ---------- | --------- | ------ |
78
+ | | 4-shot | 4-shot | 4-shot | 4-shot | 4-shot | 5-shot | 4-shot | 4-shot | 3-shot | 0-shot | |
79
+ | | Acc | EM acc | Acc | EM acc | Acc | Acc | EM acc | CoT EM Acc | CoT EM Acc | pass@1 | |
80
+ | google/gemma-3-1b-pt | 0.304 | 0.358 | 0.471 | 0.501 | 0.832 | 0.262 | 0.016 | 0.008 | 0.276 | 0.070 | 0.310 |
81
+ | Qwen/Qwen2.5-1.5B | 0.342 | 0.397 | 0.499 | 0.506 | 0.851 | 0.610 | 0.611 | 0.314 | 0.413 | 0.356 | 0.490 |
82
+ | google/gemma-2-2b | 0.342 | 0.552 | 0.552 | 0.501 | 0.890 | 0.530 | 0.249 | 0.176 | 0.415 | 0.188 | 0.439 |
83
+ | rinna/gemma-2-baku-2b | 0.314 | 0.475 | 0.533 | 0.501 | 0.881 | 0.493 | 0.168 | 0.110 | 0.376 | 0.150 | 0.400 |
84
+ | **tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1** | 0.312 | 0.435 | 0.516 | 0.501 | 0.871 | 0.538 | 0.275 | 0.144 | 0.384 | 0.286 | 0.426 |
85
+ | Qwen/Qwen2.5-3B | 0.360 | 0.504 | 0.553 | 0.541 | 0.872 | 0.657 | 0.580 | 0.440 | 0.442 | 0.387 | 0.534 |
86
+ | google/gemma-3-4b-pt | 0.360 | 0.603 | 0.576 | 0.502 | 0.895 | 0.596 | 0.376 | 0.258 | 0.495 | 0.351 | 0.501 |
87
+ | Qwen/Qwen2.5-7B | 0.392 | 0.601 | 0.600 | 0.618 | 0.888 | 0.742 | 0.832 | 0.510 | 0.562 | 0.554 | 0.630 |
88
+ | tokyotech-llm/Llama-3.1-Swallow-8B-v0.2 | 0.382 | 0.651 | 0.596 | 0.513 | 0.904 | 0.622 | 0.521 | 0.228 | 0.605 | 0.366 | 0.539 |
89
+ | google/gemma-2-9b | 0.382 | 0.718 | 0.626 | 0.506 | 0.907 | 0.706 | 0.688 | 0.338 | 0.704 | 0.390 | 0.597 |
90
+ | **tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1** | 0.362 | 0.659 | 0.602 | 0.532 | 0.906 | 0.687 | 0.678 | 0.330 | 0.664 | 0.529 | 0.595 |
91
+ | google/gemma-3-12b-pt | 0.398 | 0.747 | 0.637 | 0.524 | 0.917 | 0.737 | 0.703 | 0.398 | 0.683 | 0.445 | 0.619 |
92
+ | google/gemma-2-27b | 0.412 | 0.780 | 0.675 | 0.549 | 0.921 | 0.754 | 0.757 | 0.438 | 0.760 | 0.508 | 0.655 |
93
+ | **tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1** | 0.414 | 0.756 | 0.652 | 0.597 | 0.915 | 0.749 | 0.732 | 0.416 | 0.765 | 0.658 | 0.665 |
94
+ | google/gemma-3-27b-pt | 0.414 | 0.809 | 0.667 | 0.618 | 0.923 | 0.780 | 0.801 | 0.520 | 0.732 | 0.507 | 0.677 |
95
+ | Qwen/Qwen2.5-32B | 0.406 | 0.664 | 0.656 | 0.668 | 0.913 | 0.832 | 0.718 | 0.600 | 0.717 | 0.523 | 0.670 |
96
 
97
  ## Evaluation Benchmarks
98