Update README.md
Browse files
README.md
CHANGED
@@ -9,9 +9,9 @@ license:
|
|
9 |
- llama3.3
|
10 |
---
|
11 |
|
12 |
-
# Gemma-2-Llama
|
13 |
|
14 |
-
Gemma-2-Llama
|
15 |
Gemma 2 Swallow enhanced the Japanese language capabilities of the original Gemma 2 while retaining the English language capabilities.
|
16 |
We use approximately 200 billion tokens that were sampled from a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia articles, and mathematical and coding contents, etc (see the Training Datasets section of the base model) for continual pre-training.
|
17 |
The instruction-tuned models (it) were built by supervised fine-tuning (SFT) on the synthetic data specially built for Japanese.
|
@@ -50,49 +50,49 @@ The website [https://swallow-llm.github.io/](https://swallow-llm.github.io/index
|
|
50 |
|
51 |
### Japanese tasks
|
52 |
|
53 |
-
| Model
|
54 |
-
|
|
55 |
-
|
|
56 |
-
|
|
57 |
-
| google/gemma-3-1b-pt
|
58 |
-
| Qwen/Qwen2.5-1.5B
|
59 |
-
| google/gemma-2-2b
|
60 |
-
| rinna/gemma-2-baku-2b
|
61 |
-
| tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1 | 0.830 | 0.509 | 0.549 | 0.863 | 0.119 | 0.172 | 0.261 | 0.195 | 0.461 | 0.251 | 0.421 |
|
62 |
-
| Qwen/Qwen2.5-3B
|
63 |
-
| google/gemma-3-4b-pt
|
64 |
-
| Qwen/Qwen2.5-7B
|
65 |
-
| tokyotech-llm/Llama-3.1-Swallow-8B-v0.2
|
66 |
-
| google/gemma-2-9b
|
67 |
-
| tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1 | 0.950 | 0.643 | 0.677 | 0.897 | 0.187 | 0.560 | 0.304 | 0.247 | 0.650 | 0.462 | 0.558 |
|
68 |
-
| google/gemma-3-12b-pt
|
69 |
-
| google/gemma-2-27b
|
70 |
-
| tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1 | 0.958 | 0.660 | 0.671 | 0.924 | 0.200 | 0.644 | 0.321 | 0.255 | 0.679 | 0.629 | 0.594 |
|
71 |
-
| google/gemma-3-27b-pt
|
72 |
-
| Qwen/Qwen2.5-32B
|
73 |
|
74 |
### English tasks
|
75 |
|
76 |
-
| Model
|
77 |
-
|
|
78 |
-
|
|
79 |
-
|
|
80 |
-
| google/gemma-3-1b-pt
|
81 |
-
| Qwen/Qwen2.5-1.5B
|
82 |
-
| google/gemma-2-2b
|
83 |
-
| rinna/gemma-2-baku-2b
|
84 |
-
| tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1 | 0.312 | 0.435 | 0.516 | 0.501 | 0.871 | 0.538 | 0.275 | 0.144 | 0.384 | 0.286 | 0.426 |
|
85 |
-
| Qwen/Qwen2.5-3B
|
86 |
-
| google/gemma-3-4b-pt
|
87 |
-
| Qwen/Qwen2.5-7B
|
88 |
-
| tokyotech-llm/Llama-3.1-Swallow-8B-v0.2
|
89 |
-
| google/gemma-2-9b
|
90 |
-
| tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1 | 0.362 | 0.659 | 0.602 | 0.532 | 0.906 | 0.687 | 0.678 | 0.330 | 0.664 | 0.529 | 0.595 |
|
91 |
-
| google/gemma-3-12b-pt
|
92 |
-
| google/gemma-2-27b
|
93 |
-
| tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1 | 0.414 | 0.756 | 0.652 | 0.597 | 0.915 | 0.749 | 0.732 | 0.416 | 0.765 | 0.658 | 0.665 |
|
94 |
-
| google/gemma-3-27b-pt
|
95 |
-
| Qwen/Qwen2.5-32B
|
96 |
|
97 |
## Evaluation Benchmarks
|
98 |
|
|
|
9 |
- llama3.3
|
10 |
---
|
11 |
|
12 |
+
# Gemma-2-Llama-Swallow
|
13 |
|
14 |
+
Gemma-2-Llama-Swallow series was built by continual pre-training on the [gemma-2](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315) models.
|
15 |
Gemma 2 Swallow enhanced the Japanese language capabilities of the original Gemma 2 while retaining the English language capabilities.
|
16 |
We use approximately 200 billion tokens that were sampled from a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia articles, and mathematical and coding contents, etc (see the Training Datasets section of the base model) for continual pre-training.
|
17 |
The instruction-tuned models (it) were built by supervised fine-tuning (SFT) on the synthetic data specially built for Japanese.
|
|
|
50 |
|
51 |
### Japanese tasks
|
52 |
|
53 |
+
| Model | JCom. | JEMHopQA | NIILC | JSQuAD | XL-Sum | MGSM | WMT20-en-ja | WMT20-ja-en | JMMLU | JHumanEval | Ja Avg |
|
54 |
+
| --------------------------------------------------- | ------ | -------- | ------- | ------- | ------- | ------ | ----------- | ----------- | ------ | ---------- | ------ |
|
55 |
+
| | 4-shot | 4-shot | 4-shot | 4-shot | 1-shot | 4-shot | 4-shot | 4-shot | 5-shot | 0-shot | |
|
56 |
+
| | EM acc | Char-F1 | Char-F1 | Char-F1 | ROUGE-2 | EM acc | BLEU | BLEU | EM acc | pass@1 | |
|
57 |
+
| google/gemma-3-1b-pt | 0.237 | 0.410 | 0.252 | 0.631 | 0.079 | 0.024 | 0.150 | 0.136 | 0.239 | 0.073 | 0.223 |
|
58 |
+
| Qwen/Qwen2.5-1.5B | 0.800 | 0.383 | 0.241 | 0.849 | 0.143 | 0.292 | 0.132 | 0.134 | 0.438 | 0.308 | 0.372 |
|
59 |
+
| google/gemma-2-2b | 0.721 | 0.472 | 0.316 | 0.810 | 0.083 | 0.124 | 0.203 | 0.190 | 0.388 | 0.177 | 0.348 |
|
60 |
+
| rinna/gemma-2-baku-2b | 0.760 | 0.475 | 0.443 | 0.843 | 0.121 | 0.124 | 0.255 | 0.187 | 0.376 | 0.137 | 0.372 |
|
61 |
+
| **tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1** | 0.830 | 0.509 | 0.549 | 0.863 | 0.119 | 0.172 | 0.261 | 0.195 | 0.461 | 0.251 | 0.421 |
|
62 |
+
| Qwen/Qwen2.5-3B | 0.847 | 0.475 | 0.306 | 0.878 | 0.176 | 0.460 | 0.180 | 0.167 | 0.529 | 0.404 | 0.442 |
|
63 |
+
| google/gemma-3-4b-pt | 0.851 | 0.432 | 0.410 | 0.887 | 0.139 | 0.248 | 0.230 | 0.205 | 0.499 | 0.273 | 0.417 |
|
64 |
+
| Qwen/Qwen2.5-7B | 0.924 | 0.459 | 0.426 | 0.907 | 0.216 | 0.616 | 0.229 | 0.199 | 0.634 | 0.507 | 0.512 |
|
65 |
+
| tokyotech-llm/Llama-3.1-Swallow-8B-v0.2 | 0.911 | 0.510 | 0.627 | 0.892 | 0.198 | 0.464 | 0.296 | 0.233 | 0.525 | 0.336 | 0.499 |
|
66 |
+
| google/gemma-2-9b | 0.904 | 0.573 | 0.524 | 0.898 | 0.168 | 0.456 | 0.269 | 0.236 | 0.623 | 0.345 | 0.500 |
|
67 |
+
| **tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1** | 0.950 | 0.643 | 0.677 | 0.897 | 0.187 | 0.560 | 0.304 | 0.247 | 0.650 | 0.462 | 0.558 |
|
68 |
+
| google/gemma-3-12b-pt | 0.787 | 0.563 | 0.569 | 0.911 | 0.194 | 0.584 | 0.288 | 0.244 | 0.659 | 0.385 | 0.518 |
|
69 |
+
| google/gemma-2-27b | 0.936 | 0.553 | 0.573 | 0.916 | 0.194 | 0.596 | 0.295 | 0.251 | 0.659 | 0.490 | 0.546 |
|
70 |
+
| **tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1** | 0.958 | 0.660 | 0.671 | 0.924 | 0.200 | 0.644 | 0.321 | 0.255 | 0.679 | 0.629 | 0.594 |
|
71 |
+
| google/gemma-3-27b-pt | 0.944 | 0.582 | 0.627 | 0.915 | 0.210 | 0.704 | 0.301 | 0.255 | 0.724 | 0.473 | 0.574 |
|
72 |
+
| Qwen/Qwen2.5-32B | 0.961 | 0.561 | 0.538 | 0.925 | 0.228 | 0.808 | 0.271 | 0.233 | 0.751 | 0.637 | 0.591 |
|
73 |
|
74 |
### English tasks
|
75 |
|
76 |
+
| Model | OpenBookQA | TriviaQA | HellaSWAG | SQuAD2.0 | XWINO | MMLU | GSM8K | MATH | BBH | HumanEval | En Avg |
|
77 |
+
| --------------------------------------------------- | ---------- | -------- | --------- | -------- | ------ | ------ | ------ | ---------- | ---------- | --------- | ------ |
|
78 |
+
| | 4-shot | 4-shot | 4-shot | 4-shot | 4-shot | 5-shot | 4-shot | 4-shot | 3-shot | 0-shot | |
|
79 |
+
| | Acc | EM acc | Acc | EM acc | Acc | Acc | EM acc | CoT EM Acc | CoT EM Acc | pass@1 | |
|
80 |
+
| google/gemma-3-1b-pt | 0.304 | 0.358 | 0.471 | 0.501 | 0.832 | 0.262 | 0.016 | 0.008 | 0.276 | 0.070 | 0.310 |
|
81 |
+
| Qwen/Qwen2.5-1.5B | 0.342 | 0.397 | 0.499 | 0.506 | 0.851 | 0.610 | 0.611 | 0.314 | 0.413 | 0.356 | 0.490 |
|
82 |
+
| google/gemma-2-2b | 0.342 | 0.552 | 0.552 | 0.501 | 0.890 | 0.530 | 0.249 | 0.176 | 0.415 | 0.188 | 0.439 |
|
83 |
+
| rinna/gemma-2-baku-2b | 0.314 | 0.475 | 0.533 | 0.501 | 0.881 | 0.493 | 0.168 | 0.110 | 0.376 | 0.150 | 0.400 |
|
84 |
+
| **tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1** | 0.312 | 0.435 | 0.516 | 0.501 | 0.871 | 0.538 | 0.275 | 0.144 | 0.384 | 0.286 | 0.426 |
|
85 |
+
| Qwen/Qwen2.5-3B | 0.360 | 0.504 | 0.553 | 0.541 | 0.872 | 0.657 | 0.580 | 0.440 | 0.442 | 0.387 | 0.534 |
|
86 |
+
| google/gemma-3-4b-pt | 0.360 | 0.603 | 0.576 | 0.502 | 0.895 | 0.596 | 0.376 | 0.258 | 0.495 | 0.351 | 0.501 |
|
87 |
+
| Qwen/Qwen2.5-7B | 0.392 | 0.601 | 0.600 | 0.618 | 0.888 | 0.742 | 0.832 | 0.510 | 0.562 | 0.554 | 0.630 |
|
88 |
+
| tokyotech-llm/Llama-3.1-Swallow-8B-v0.2 | 0.382 | 0.651 | 0.596 | 0.513 | 0.904 | 0.622 | 0.521 | 0.228 | 0.605 | 0.366 | 0.539 |
|
89 |
+
| google/gemma-2-9b | 0.382 | 0.718 | 0.626 | 0.506 | 0.907 | 0.706 | 0.688 | 0.338 | 0.704 | 0.390 | 0.597 |
|
90 |
+
| **tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1** | 0.362 | 0.659 | 0.602 | 0.532 | 0.906 | 0.687 | 0.678 | 0.330 | 0.664 | 0.529 | 0.595 |
|
91 |
+
| google/gemma-3-12b-pt | 0.398 | 0.747 | 0.637 | 0.524 | 0.917 | 0.737 | 0.703 | 0.398 | 0.683 | 0.445 | 0.619 |
|
92 |
+
| google/gemma-2-27b | 0.412 | 0.780 | 0.675 | 0.549 | 0.921 | 0.754 | 0.757 | 0.438 | 0.760 | 0.508 | 0.655 |
|
93 |
+
| **tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1** | 0.414 | 0.756 | 0.652 | 0.597 | 0.915 | 0.749 | 0.732 | 0.416 | 0.765 | 0.658 | 0.665 |
|
94 |
+
| google/gemma-3-27b-pt | 0.414 | 0.809 | 0.667 | 0.618 | 0.923 | 0.780 | 0.801 | 0.520 | 0.732 | 0.507 | 0.677 |
|
95 |
+
| Qwen/Qwen2.5-32B | 0.406 | 0.664 | 0.656 | 0.668 | 0.913 | 0.832 | 0.718 | 0.600 | 0.717 | 0.523 | 0.670 |
|
96 |
|
97 |
## Evaluation Benchmarks
|
98 |
|