Update README.md
Browse files
README.md
CHANGED
@@ -56,20 +56,17 @@ print(generated_text)
|
|
56 |
vLLM also supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
|
57 |
|
58 |
|
59 |
-
## Evaluation
|
60 |
-
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
-
|
65 |
-
|
66 |
-
| | |strict-match | 5|exact_match|↑ |0.9568|± |0.0056|
|
67 |
-
```
|
68 |
|
69 |
-
-
|
70 |
-
|
71 |
-
|
|
72 |
-
|
73 |
-
|
|
74 |
-
|
|
75 |
-
```
|
|
|
56 |
vLLM also supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
|
57 |
|
58 |
|
59 |
+
## Evaluation
|
60 |
+
|
61 |
+
The model was evaluated on popular reasoning tasks (AIME 2024, MATH-500, GPQA-Diamond) via [LightEval](https://github.com/huggingface/open-r1).
|
62 |
+
For reasoning evaluations, we estimate pass@1 based on 10 runs with different seeds.
|
63 |
+
|
64 |
+
|
65 |
+
### Accuracy
|
|
|
|
|
66 |
|
67 |
+
| | Recovery (%) | deepseek/DeepSeek-R1-0528 | RedHatAI/DeepSeek-R1-0528-quantized.w4a16<br>(this model) |
|
68 |
+
| --------------------------- | :----------: | :------------------: | :--------------------------------------------------: |
|
69 |
+
| AIME 2024<br>pass@1 | 98.50 | 88.66 | 87.33 |
|
70 |
+
| MATH-500<br>pass@1 | 99.88 | 97.52 | 97.40 |
|
71 |
+
| GPQA Diamond<br>pass@1 | 101.21 | 79.65 | 80.61 |
|
72 |
+
| **Reasoning<br>Average Score** | **99.82** | **88.61** | **88.45** |
|
|