Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -56,20 +56,17 @@ print(generated_text)
 vLLM also supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
-## Evaluation (More evals coming soon)
-- unquantized baseline on GSM8k
-```bash
-|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
-|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
-|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9591|±  |0.0055|
-|     |       |strict-match    |     5|exact_match|↑  |0.9568|±  |0.0056|
-```
-- this INT4 quantized model on GSM8k
-```bash
-|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
-|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
-|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9560|±  |0.0056|
-|     |       |strict-match    |     5|exact_match|↑  |0.9553|±  |0.0057|
-```

 vLLM also supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
+## Evaluation
+The model was evaluated on popular reasoning tasks (AIME 2024, MATH-500, GPQA-Diamond) via [LightEval](https://github.com/huggingface/open-r1).
+For reasoning evaluations, we estimate pass@1 based on 10 runs with different seeds.
+### Accuracy
+|                             | Recovery (%) | deepseek/DeepSeek-R1-0528 | RedHatAI/DeepSeek-R1-0528-quantized.w4a16<br>(this model) |
+| --------------------------- | :----------: | :------------------: | :--------------------------------------------------: |
+| AIME 2024<br>pass@1         | 98.50         | 88.66                | 87.33                                                |
+| MATH-500<br>pass@1          | 99.88        | 97.52                | 97.40                                                |
+| GPQA Diamond<br>pass@1      | 101.21        | 79.65                | 80.61                                                |
+| **Reasoning<br>Average Score**  | **99.82**        | **88.61**                | **88.45**                                                |