Magpie-Align
/

Llama-3-8B-Magpie-Align-v0.2

Text Generation

alignment-handbook

Generated from Trainer

text-generation-inference

Model card Files Files and versions Community

Zhangchen Xu commited on Jul 18, 2024

Commit

1051908

·

verified ·

1 Parent(s): 1b8b63a

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -44,7 +44,7 @@ The overall performance is even better than the official Llama-3-8B-Instruct Mod
 - **Alpaca Eval 2 (vs Llama-3-8B-Instruct): 75.17 (LC), 78.20 (WR)**
 - **Arena Hard: 37.5**
 - **WildBench WB-Score: 42.7**
-- **Zero-Eval MMLU: 46.70**
 ## 🔥 Model Performance
@@ -63,7 +63,7 @@ We compare our Llama-3-8B-Magpie-Align with official and other **open-aligned LL
 +---------------------------------------------+------+------+------+----------+---------+-----------+-----------+------------+
 | NousResearch/Hermes-2-Pro-Llama-3-8B        | 8.05 | 7.35 | 7.70 |   15.60  |  12.86  |   36.37   |   30.52   |    11.5    |
 +---------------------------------------------+------+------+------+----------+---------+-----------+-----------+------------+
-| allenai/llama-3-tulu-2-dpo-8b               | 7.71 | 7.15 | 7.43 |   14.89  |   14.8  |   35.43   |   35.42   |    11.7    |
 +---------------------------------------------+------+------+------+----------+---------+-----------+-----------+------------+
 | cognitivecomputations/dolphin-2.9-llama3-8b | 7.97 | 6.98 | 7.47 |   12.50  |   8.79  |   32.67   |   22.80   |     8.2    |
 +---------------------------------------------+------+------+------+----------+---------+-----------+-----------+------------+

 - **Alpaca Eval 2 (vs Llama-3-8B-Instruct): 75.17 (LC), 78.20 (WR)**
 - **Arena Hard: 37.5**
 - **WildBench WB-Score: 42.7**
+- **Zero-Eval GSM: 46.70**
 ## 🔥 Model Performance
 +---------------------------------------------+------+------+------+----------+---------+-----------+-----------+------------+
 | NousResearch/Hermes-2-Pro-Llama-3-8B        | 8.05 | 7.35 | 7.70 |   15.60  |  12.86  |   36.37   |   30.52   |    11.5    |
 +---------------------------------------------+------+------+------+----------+---------+-----------+-----------+------------+
+| allenai/llama-3-tulu-2-dpo-8b               | 7.71 | 7.15 | 7.43 |   14.89  |  14.80  |   35.43   |   35.42   |    11.7    |
 +---------------------------------------------+------+------+------+----------+---------+-----------+-----------+------------+
 | cognitivecomputations/dolphin-2.9-llama3-8b | 7.97 | 6.98 | 7.47 |   12.50  |   8.79  |   32.67   |   22.80   |     8.2    |
 +---------------------------------------------+------+------+------+----------+---------+-----------+-----------+------------+