Update README.md
Browse files
README.md
CHANGED
@@ -50,7 +50,7 @@ The website [https://swallow-llm.github.io/](https://swallow-llm.github.io/) pro
|
|
50 |
|Model|JCom.|JEMHopQA|NIILC|JSQuAD|XL-Sum|MGSM|WMT20-en-ja|WMT20-ja-en|JMMLU|JHumanEval|Ja Avg|
|
51 |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
52 |
| |4-shot|4-shot|4-shot|4-shot|1-shot|4-shot|4-shot|4-shot|5-shot|0-shot| |
|
53 |
-
| |
|
54 |
| Qwen2-72B-Instruct | 0.9634 | 0.6268 | 0.5418 | 0.9210 | 0.1644 | 0.7840 | 0.2592 | 0.2327 | 0.7713 | 0.6909 | 0.5955 |
|
55 |
| Qwen2.5-72B-Instruct | **0.9696** | 0.5699 | 0.5811 | 0.7381 | 0.1706 | **0.8360** | 0.2269 | 0.2179 | **0.7899** | 0.6256 | 0.5726 |
|
56 |
| Llama 3 70B Instruct | 0.9419 | 0.6114 | 0.5506 | 0.9164 | 0.1912 | 0.7200 | 0.2708 | 0.2350 | 0.6789 | 0.6610 | 0.5777 |
|
@@ -63,10 +63,10 @@ The website [https://swallow-llm.github.io/](https://swallow-llm.github.io/) pro
|
|
63 |
|
64 |
### English tasks
|
65 |
|
66 |
-
|Model|OpenBookQA|TriviaQA|HellaSWAG|SQuAD2.0|XWINO|MMLU|GSM8K|BBH|HumanEval|
|
67 |
|---|---|---|---|---|---|---|---|---|---|---|
|
68 |
-
|
69 |
-
|
70 |
| Qwen2-72B-Instruct | 0.4360 | 0.7588 | 0.6857 | 0.3913 | 0.9110 | 0.8391 | 0.8499 | 0.2436 | 0.6939 | 0.6455 |
|
71 |
| Qwen2.5-72B-Instruct | **0.4540** | 0.6764 | **0.7064** | 0.3550 | 0.8895 | **0.8478** | **0.9113** | 0.4027 | 0.6165 | 0.6511 |
|
72 |
| Llama 3 70B Instruct | 0.4400 | 0.7999 | 0.6552 | 0.4024 | 0.9127 | 0.7992 | 0.9052 | 0.8326 | 0.7555 | 0.7225 |
|
@@ -79,8 +79,8 @@ The website [https://swallow-llm.github.io/](https://swallow-llm.github.io/) pro
|
|
79 |
|
80 |
## MT-Bench JA
|
81 |
|
82 |
-
|Model|coding|extraction|humanities|math|reasoning|roleplay|stem|writing|JMTAvg|
|
83 |
-
|
84 |
| Qwen2-72B-Instruct | 0.5699 | 0.7858 | 0.8222 | 0.5096 | **0.7032** | 0.7963 | 0.7728 | **0.8223** | 0.7228 |
|
85 |
| Qwen2.5-72B-Instruct | 0.7060 | 0.7866 | 0.8122 | **0.6968** | 0.6536 | **0.8301** | 0.8060 | 0.7841 | 0.7594 |
|
86 |
| Llama 3 70B Instruct | 0.5969 | 0.8410 | 0.7120 | 0.4481 | 0.4884 | 0.7117 | 0.6510 | 0.6900 | 0.6424 |
|
|
|
50 |
|Model|JCom.|JEMHopQA|NIILC|JSQuAD|XL-Sum|MGSM|WMT20-en-ja|WMT20-ja-en|JMMLU|JHumanEval|Ja Avg|
|
51 |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
52 |
| |4-shot|4-shot|4-shot|4-shot|1-shot|4-shot|4-shot|4-shot|5-shot|0-shot| |
|
53 |
+
| |EM acc|Char-F1|Char-F1|Char-F1|ROUGE-2|EM acc|BLEU|BLEU|EM acc|pass@1| |
|
54 |
| Qwen2-72B-Instruct | 0.9634 | 0.6268 | 0.5418 | 0.9210 | 0.1644 | 0.7840 | 0.2592 | 0.2327 | 0.7713 | 0.6909 | 0.5955 |
|
55 |
| Qwen2.5-72B-Instruct | **0.9696** | 0.5699 | 0.5811 | 0.7381 | 0.1706 | **0.8360** | 0.2269 | 0.2179 | **0.7899** | 0.6256 | 0.5726 |
|
56 |
| Llama 3 70B Instruct | 0.9419 | 0.6114 | 0.5506 | 0.9164 | 0.1912 | 0.7200 | 0.2708 | 0.2350 | 0.6789 | 0.6610 | 0.5777 |
|
|
|
63 |
|
64 |
### English tasks
|
65 |
|
66 |
+
|Model|OpenBookQA|TriviaQA|HellaSWAG|SQuAD2.0|XWINO|MMLU|GSM8K|BBH|HumanEval|En Avg|
|
67 |
|---|---|---|---|---|---|---|---|---|---|---|
|
68 |
+
| |4-shot|4-shot|4-shot|4-shot|4-shot|5-shot|4-shot|3-shot|0-shot| |
|
69 |
+
| |Acc|EM acc|Acc|EM acc|Acc|Acc|EM acc|CoT EM Acc|pass@1| |
|
70 |
| Qwen2-72B-Instruct | 0.4360 | 0.7588 | 0.6857 | 0.3913 | 0.9110 | 0.8391 | 0.8499 | 0.2436 | 0.6939 | 0.6455 |
|
71 |
| Qwen2.5-72B-Instruct | **0.4540** | 0.6764 | **0.7064** | 0.3550 | 0.8895 | **0.8478** | **0.9113** | 0.4027 | 0.6165 | 0.6511 |
|
72 |
| Llama 3 70B Instruct | 0.4400 | 0.7999 | 0.6552 | 0.4024 | 0.9127 | 0.7992 | 0.9052 | 0.8326 | 0.7555 | 0.7225 |
|
|
|
79 |
|
80 |
## MT-Bench JA
|
81 |
|
82 |
+
| Model | coding | extraction | humanities | math | reasoning | roleplay | stem | writing | JMTAvg |
|
83 |
+
|-------|--------|------------|------------|------|-----------|----------|------|---------|--------|
|
84 |
| Qwen2-72B-Instruct | 0.5699 | 0.7858 | 0.8222 | 0.5096 | **0.7032** | 0.7963 | 0.7728 | **0.8223** | 0.7228 |
|
85 |
| Qwen2.5-72B-Instruct | 0.7060 | 0.7866 | 0.8122 | **0.6968** | 0.6536 | **0.8301** | 0.8060 | 0.7841 | 0.7594 |
|
86 |
| Llama 3 70B Instruct | 0.5969 | 0.8410 | 0.7120 | 0.4481 | 0.4884 | 0.7117 | 0.6510 | 0.6900 | 0.6424 |
|