ekurtic commited on
Commit
cce668c
·
verified ·
1 Parent(s): 9aa823f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -16
README.md CHANGED
@@ -56,20 +56,17 @@ print(generated_text)
56
  vLLM also supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
57
 
58
 
59
- ## Evaluation (More evals coming soon)
60
-
61
- - unquantized baseline on GSM8k
62
- ```bash
63
- |Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
64
- |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
65
- |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.9591|± |0.0055|
66
- | | |strict-match | 5|exact_match|↑ |0.9568|± |0.0056|
67
- ```
68
 
69
- - this INT4 quantized model on GSM8k
70
- ```bash
71
- |Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
72
- |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
73
- |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.9560|± |0.0056|
74
- | | |strict-match | 5|exact_match|↑ |0.9553|± |0.0057|
75
- ```
 
56
  vLLM also supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
57
 
58
 
59
+ ## Evaluation
60
+
61
+ The model was evaluated on popular reasoning tasks (AIME 2024, MATH-500, GPQA-Diamond) via [LightEval](https://github.com/huggingface/open-r1).
62
+ For reasoning evaluations, we estimate pass@1 based on 10 runs with different seeds.
63
+
64
+
65
+ ### Accuracy
 
 
66
 
67
+ | | Recovery (%) | deepseek/DeepSeek-R1-0528 | RedHatAI/DeepSeek-R1-0528-quantized.w4a16<br>(this model) |
68
+ | --------------------------- | :----------: | :------------------: | :--------------------------------------------------: |
69
+ | AIME 2024<br>pass@1 | 98.50 | 88.66 | 87.33 |
70
+ | MATH-500<br>pass@1 | 99.88 | 97.52 | 97.40 |
71
+ | GPQA Diamond<br>pass@1 | 101.21 | 79.65 | 80.61 |
72
+ | **Reasoning<br>Average Score** | **99.82** | **88.61** | **88.45** |