tokyotech-llm
/

Gemma-2-Llama-Swallow-9b-pt-v0.1

@@ -9,9 +9,9 @@ license:
   - llama3.3
 ---
-# Gemma-2-Llama Swallow
-Gemma-2-Llama Swallow series was built by continual pre-training on the [gemma-2](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315) models.
 Gemma 2 Swallow enhanced the Japanese language capabilities of the original Gemma 2 while retaining the English language capabilities.
 We use approximately 200 billion tokens that were sampled from a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia articles, and mathematical and coding contents, etc (see the Training Datasets section of the base model) for continual pre-training.
 The instruction-tuned models (it) were built by supervised fine-tuning (SFT) on the synthetic data specially built for Japanese.
@@ -50,49 +50,49 @@ The website [https://swallow-llm.github.io/](https://swallow-llm.github.io/index
 ### Japanese tasks
-| Model                                           | JCom.  | JEMHopQA | NIILC   | JSQuAD  | XL-Sum  | MGSM   | WMT20-en-ja | WMT20-ja-en | JMMLU  | JHumanEval | Ja Avg |
-| ----------------------------------------------- | ------ | -------- | ------- | ------- | ------- | ------ | ----------- | ----------- | ------ | ---------- | ------ |
-|                                                 | 4-shot | 4-shot   | 4-shot  | 4-shot  | 1-shot  | 4-shot | 4-shot      | 4-shot      | 5-shot | 0-shot     |        |
-|                                                 | EM acc | Char-F1  | Char-F1 | Char-F1 | ROUGE-2 | EM acc | BLEU        | BLEU        | EM acc | pass@1     |        |
-| google/gemma-3-1b-pt                            | 0.237  | 0.410    | 0.252   | 0.631   | 0.079   | 0.024  | 0.150       | 0.136       | 0.239  | 0.073      | 0.223  |
-| Qwen/Qwen2.5-1.5B                               | 0.800  | 0.383    | 0.241   | 0.849   | 0.143   | 0.292  | 0.132       | 0.134       | 0.438  | 0.308      | 0.372  |
-| google/gemma-2-2b                               | 0.721  | 0.472    | 0.316   | 0.810   | 0.083   | 0.124  | 0.203       | 0.190       | 0.388  | 0.177      | 0.348  |
-| rinna/gemma-2-baku-2b                           | 0.760  | 0.475    | 0.443   | 0.843   | 0.121   | 0.124  | 0.255       | 0.187       | 0.376  | 0.137      | 0.372  |
-| tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1  | 0.830  | 0.509    | 0.549   | 0.863   | 0.119   | 0.172  | 0.261       | 0.195       | 0.461  | 0.251      | 0.421  |
-| Qwen/Qwen2.5-3B                                 | 0.847  | 0.475    | 0.306   | 0.878   | 0.176   | 0.460  | 0.180       | 0.167       | 0.529  | 0.404      | 0.442  |
-| google/gemma-3-4b-pt                            | 0.851  | 0.432    | 0.410   | 0.887   | 0.139   | 0.248  | 0.230       | 0.205       | 0.499  | 0.273      | 0.417  |
-| Qwen/Qwen2.5-7B                                 | 0.924  | 0.459    | 0.426   | 0.907   | 0.216   | 0.616  | 0.229       | 0.199       | 0.634  | 0.507      | 0.512  |
-| tokyotech-llm/Llama-3.1-Swallow-8B-v0.2         | 0.911  | 0.510    | 0.627   | 0.892   | 0.198   | 0.464  | 0.296       | 0.233       | 0.525  | 0.336      | 0.499  |
-| google/gemma-2-9b                               | 0.904  | 0.573    | 0.524   | 0.898   | 0.168   | 0.456  | 0.269       | 0.236       | 0.623  | 0.345      | 0.500  |
-| tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1  | 0.950  | 0.643    | 0.677   | 0.897   | 0.187   | 0.560  | 0.304       | 0.247       | 0.650  | 0.462      | 0.558  |
-| google/gemma-3-12b-pt                           | 0.787  | 0.563    | 0.569   | 0.911   | 0.194   | 0.584  | 0.288       | 0.244       | 0.659  | 0.385      | 0.518  |
-| google/gemma-2-27b                              | 0.936  | 0.553    | 0.573   | 0.916   | 0.194   | 0.596  | 0.295       | 0.251       | 0.659  | 0.490      | 0.546  |
-| tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1 | 0.958  | 0.660    | 0.671   | 0.924   | 0.200   | 0.644  | 0.321       | 0.255       | 0.679  | 0.629      | 0.594  |
-| google/gemma-3-27b-pt                           | 0.944  | 0.582    | 0.627   | 0.915   | 0.210   | 0.704  | 0.301       | 0.255       | 0.724  | 0.473      | 0.574  |
-| Qwen/Qwen2.5-32B                                | 0.961  | 0.561    | 0.538   | 0.925   | 0.228   | 0.808  | 0.271       | 0.233       | 0.751  | 0.637      | 0.591  |
 ### English tasks
-| Model                                           | OpenBookQA | TriviaQA | HellaSWAG | SQuAD2.0 | XWINO  | MMLU   | GSM8K  | MATH       | BBH        | HumanEval | En Avg |
-| ----------------------------------------------- | ---------- | -------- | --------- | -------- | ------ | ------ | ------ | ---------- | ---------- | --------- | ------ |
-|                                                 | 4-shot     | 4-shot   | 4-shot    | 4-shot   | 4-shot | 5-shot | 4-shot | 4-shot     | 3-shot     | 0-shot    |        |
-|                                                 | Acc        | EM acc   | Acc       | EM acc   | Acc    | Acc    | EM acc | CoT EM Acc | CoT EM Acc | pass@1    |        |
-| google/gemma-3-1b-pt                            | 0.304      | 0.358    | 0.471     | 0.501    | 0.832  | 0.262  | 0.016  | 0.008      | 0.276      | 0.070     | 0.310  |
-| Qwen/Qwen2.5-1.5B                               | 0.342      | 0.397    | 0.499     | 0.506    | 0.851  | 0.610  | 0.611  | 0.314      | 0.413      | 0.356     | 0.490  |
-| google/gemma-2-2b                               | 0.342      | 0.552    | 0.552     | 0.501    | 0.890  | 0.530  | 0.249  | 0.176      | 0.415      | 0.188     | 0.439  |
-| rinna/gemma-2-baku-2b                           | 0.314      | 0.475    | 0.533     | 0.501    | 0.881  | 0.493  | 0.168  | 0.110      | 0.376      | 0.150     | 0.400  |
-| tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1  | 0.312      | 0.435    | 0.516     | 0.501    | 0.871  | 0.538  | 0.275  | 0.144      | 0.384      | 0.286     | 0.426  |
-| Qwen/Qwen2.5-3B                                 | 0.360      | 0.504    | 0.553     | 0.541    | 0.872  | 0.657  | 0.580  | 0.440      | 0.442      | 0.387     | 0.534  |
-| google/gemma-3-4b-pt                            | 0.360      | 0.603    | 0.576     | 0.502    | 0.895  | 0.596  | 0.376  | 0.258      | 0.495      | 0.351     | 0.501  |
-| Qwen/Qwen2.5-7B                                 | 0.392      | 0.601    | 0.600     | 0.618    | 0.888  | 0.742  | 0.832  | 0.510      | 0.562      | 0.554     | 0.630  |
-| tokyotech-llm/Llama-3.1-Swallow-8B-v0.2         | 0.382      | 0.651    | 0.596     | 0.513    | 0.904  | 0.622  | 0.521  | 0.228      | 0.605      | 0.366     | 0.539  |
-| google/gemma-2-9b                               | 0.382      | 0.718    | 0.626     | 0.506    | 0.907  | 0.706  | 0.688  | 0.338      | 0.704      | 0.390     | 0.597  |
-| tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1  | 0.362      | 0.659    | 0.602     | 0.532    | 0.906  | 0.687  | 0.678  | 0.330      | 0.664      | 0.529     | 0.595  |
-| google/gemma-3-12b-pt                           | 0.398      | 0.747    | 0.637     | 0.524    | 0.917  | 0.737  | 0.703  | 0.398      | 0.683      | 0.445     | 0.619  |
-| google/gemma-2-27b                              | 0.412      | 0.780    | 0.675     | 0.549    | 0.921  | 0.754  | 0.757  | 0.438      | 0.760      | 0.508     | 0.655  |
-| tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1 | 0.414      | 0.756    | 0.652     | 0.597    | 0.915  | 0.749  | 0.732  | 0.416      | 0.765      | 0.658     | 0.665  |
-| google/gemma-3-27b-pt                           | 0.414      | 0.809    | 0.667     | 0.618    | 0.923  | 0.780  | 0.801  | 0.520      | 0.732      | 0.507     | 0.677  |
-| Qwen/Qwen2.5-32B                                | 0.406      | 0.664    | 0.656     | 0.668    | 0.913  | 0.832  | 0.718  | 0.600      | 0.717      | 0.523     | 0.670  |
 ## Evaluation Benchmarks

   - llama3.3
 ---
+# Gemma-2-Llama-Swallow
+Gemma-2-Llama-Swallow series was built by continual pre-training on the [gemma-2](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315) models.
 Gemma 2 Swallow enhanced the Japanese language capabilities of the original Gemma 2 while retaining the English language capabilities.
 We use approximately 200 billion tokens that were sampled from a large Japanese web corpus (Swallow Corpus Version 2), Japanese and English Wikipedia articles, and mathematical and coding contents, etc (see the Training Datasets section of the base model) for continual pre-training.
 The instruction-tuned models (it) were built by supervised fine-tuning (SFT) on the synthetic data specially built for Japanese.
 ### Japanese tasks
+| Model                                               | JCom.  | JEMHopQA | NIILC   | JSQuAD  | XL-Sum  | MGSM   | WMT20-en-ja | WMT20-ja-en | JMMLU  | JHumanEval | Ja Avg |
+| --------------------------------------------------- | ------ | -------- | ------- | ------- | ------- | ------ | ----------- | ----------- | ------ | ---------- | ------ |
+|                                                     | 4-shot | 4-shot   | 4-shot  | 4-shot  | 1-shot  | 4-shot | 4-shot      | 4-shot      | 5-shot | 0-shot     |        |
+|                                                     | EM acc | Char-F1  | Char-F1 | Char-F1 | ROUGE-2 | EM acc | BLEU        | BLEU        | EM acc | pass@1     |        |
+| google/gemma-3-1b-pt                                | 0.237  | 0.410    | 0.252   | 0.631   | 0.079   | 0.024  | 0.150       | 0.136       | 0.239  | 0.073      | 0.223  |
+| Qwen/Qwen2.5-1.5B                                   | 0.800  | 0.383    | 0.241   | 0.849   | 0.143   | 0.292  | 0.132       | 0.134       | 0.438  | 0.308      | 0.372  |
+| google/gemma-2-2b                                   | 0.721  | 0.472    | 0.316   | 0.810   | 0.083   | 0.124  | 0.203       | 0.190       | 0.388  | 0.177      | 0.348  |
+| rinna/gemma-2-baku-2b                               | 0.760  | 0.475    | 0.443   | 0.843   | 0.121   | 0.124  | 0.255       | 0.187       | 0.376  | 0.137      | 0.372  |
+| **tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1**  | 0.830  | 0.509    | 0.549   | 0.863   | 0.119   | 0.172  | 0.261       | 0.195       | 0.461  | 0.251      | 0.421  |
+| Qwen/Qwen2.5-3B                                     | 0.847  | 0.475    | 0.306   | 0.878   | 0.176   | 0.460  | 0.180       | 0.167       | 0.529  | 0.404      | 0.442  |
+| google/gemma-3-4b-pt                                | 0.851  | 0.432    | 0.410   | 0.887   | 0.139   | 0.248  | 0.230       | 0.205       | 0.499  | 0.273      | 0.417  |
+| Qwen/Qwen2.5-7B                                     | 0.924  | 0.459    | 0.426   | 0.907   | 0.216   | 0.616  | 0.229       | 0.199       | 0.634  | 0.507      | 0.512  |
+| tokyotech-llm/Llama-3.1-Swallow-8B-v0.2             | 0.911  | 0.510    | 0.627   | 0.892   | 0.198   | 0.464  | 0.296       | 0.233       | 0.525  | 0.336      | 0.499  |
+| google/gemma-2-9b                                   | 0.904  | 0.573    | 0.524   | 0.898   | 0.168   | 0.456  | 0.269       | 0.236       | 0.623  | 0.345      | 0.500  |
+| **tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1**  | 0.950  | 0.643    | 0.677   | 0.897   | 0.187   | 0.560  | 0.304       | 0.247       | 0.650  | 0.462      | 0.558  |
+| google/gemma-3-12b-pt                               | 0.787  | 0.563    | 0.569   | 0.911   | 0.194   | 0.584  | 0.288       | 0.244       | 0.659  | 0.385      | 0.518  |
+| google/gemma-2-27b                                  | 0.936  | 0.553    | 0.573   | 0.916   | 0.194   | 0.596  | 0.295       | 0.251       | 0.659  | 0.490      | 0.546  |
+| **tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1** | 0.958  | 0.660    | 0.671   | 0.924   | 0.200   | 0.644  | 0.321       | 0.255       | 0.679  | 0.629      | 0.594  |
+| google/gemma-3-27b-pt                               | 0.944  | 0.582    | 0.627   | 0.915   | 0.210   | 0.704  | 0.301       | 0.255       | 0.724  | 0.473      | 0.574  |
+| Qwen/Qwen2.5-32B                                    | 0.961  | 0.561    | 0.538   | 0.925   | 0.228   | 0.808  | 0.271       | 0.233       | 0.751  | 0.637      | 0.591  |
 ### English tasks
+| Model                                               | OpenBookQA | TriviaQA | HellaSWAG | SQuAD2.0 | XWINO  | MMLU   | GSM8K  | MATH       | BBH        | HumanEval | En Avg |
+| --------------------------------------------------- | ---------- | -------- | --------- | -------- | ------ | ------ | ------ | ---------- | ---------- | --------- | ------ |
+|                                                     | 4-shot     | 4-shot   | 4-shot    | 4-shot   | 4-shot | 5-shot | 4-shot | 4-shot     | 3-shot     | 0-shot    |        |
+|                                                     | Acc        | EM acc   | Acc       | EM acc   | Acc    | Acc    | EM acc | CoT EM Acc | CoT EM Acc | pass@1    |        |
+| google/gemma-3-1b-pt                                | 0.304      | 0.358    | 0.471     | 0.501    | 0.832  | 0.262  | 0.016  | 0.008      | 0.276      | 0.070     | 0.310  |
+| Qwen/Qwen2.5-1.5B                                   | 0.342      | 0.397    | 0.499     | 0.506    | 0.851  | 0.610  | 0.611  | 0.314      | 0.413      | 0.356     | 0.490  |
+| google/gemma-2-2b                                   | 0.342      | 0.552    | 0.552     | 0.501    | 0.890  | 0.530  | 0.249  | 0.176      | 0.415      | 0.188     | 0.439  |
+| rinna/gemma-2-baku-2b                               | 0.314      | 0.475    | 0.533     | 0.501    | 0.881  | 0.493  | 0.168  | 0.110      | 0.376      | 0.150     | 0.400  |
+| **tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1**  | 0.312      | 0.435    | 0.516     | 0.501    | 0.871  | 0.538  | 0.275  | 0.144      | 0.384      | 0.286     | 0.426  |
+| Qwen/Qwen2.5-3B                                     | 0.360      | 0.504    | 0.553     | 0.541    | 0.872  | 0.657  | 0.580  | 0.440      | 0.442      | 0.387     | 0.534  |
+| google/gemma-3-4b-pt                                | 0.360      | 0.603    | 0.576     | 0.502    | 0.895  | 0.596  | 0.376  | 0.258      | 0.495      | 0.351     | 0.501  |
+| Qwen/Qwen2.5-7B                                     | 0.392      | 0.601    | 0.600     | 0.618    | 0.888  | 0.742  | 0.832  | 0.510      | 0.562      | 0.554     | 0.630  |
+| tokyotech-llm/Llama-3.1-Swallow-8B-v0.2             | 0.382      | 0.651    | 0.596     | 0.513    | 0.904  | 0.622  | 0.521  | 0.228      | 0.605      | 0.366     | 0.539  |
+| google/gemma-2-9b                                   | 0.382      | 0.718    | 0.626     | 0.506    | 0.907  | 0.706  | 0.688  | 0.338      | 0.704      | 0.390     | 0.597  |
+| **tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1**  | 0.362      | 0.659    | 0.602     | 0.532    | 0.906  | 0.687  | 0.678  | 0.330      | 0.664      | 0.529     | 0.595  |
+| google/gemma-3-12b-pt                               | 0.398      | 0.747    | 0.637     | 0.524    | 0.917  | 0.737  | 0.703  | 0.398      | 0.683      | 0.445     | 0.619  |
+| google/gemma-2-27b                                  | 0.412      | 0.780    | 0.675     | 0.549    | 0.921  | 0.754  | 0.757  | 0.438      | 0.760      | 0.508     | 0.655  |
+| **tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1** | 0.414      | 0.756    | 0.652     | 0.597    | 0.915  | 0.749  | 0.732  | 0.416      | 0.765      | 0.658     | 0.665  |
+| google/gemma-3-27b-pt                               | 0.414      | 0.809    | 0.667     | 0.618    | 0.923  | 0.780  | 0.801  | 0.520      | 0.732      | 0.507     | 0.677  |
+| Qwen/Qwen2.5-32B                                    | 0.406      | 0.664    | 0.656     | 0.668    | 0.913  | 0.832  | 0.718  | 0.600      | 0.717      | 0.523     | 0.670  |
 ## Evaluation Benchmarks