ranarag commited on
Commit
cefe3ca
·
verified ·
1 Parent(s): e2b849f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +105 -107
README.md CHANGED
@@ -183,10 +183,10 @@ By redesigning a common household item like the plastic bottle, we can create a
183
  **Evaluation Results:**
184
  <table>
185
  <thead>
186
- <caption style="text-align:center"><b>Comparison with different models over various benchmarks. Scores of AlpacaEval-2.0 and Arena-Hard are calculated with thinking=True</b><sup id="fnref1"><a href="#fn1">1</a></caption>
187
  <tr>
188
  <th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
189
- <th style="text-align:center; background-color: #001d6c; color: white;">ArenaHard</th>
190
  <th style="text-align:center; background-color: #001d6c; color: white;">AlpacaEval-2.0</th>
191
  <th style="text-align:center; background-color: #001d6c; color: white;">MMLU</th>
192
  <th style="text-align:center; background-color: #001d6c; color: white;">PopQA</th>
@@ -231,114 +231,114 @@ By redesigning a common household item like the plastic bottle, we can create a
231
  <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">83.23</td>
232
  </tr>
233
  <tr>
234
- <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;"><b>Granite-3.3-2B-Instruct</b></td>
235
- <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 28.86 </td>
236
- <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 43.45 </td>
237
- <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 55.88 </td>
238
- <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 18.4 </td>
239
- <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 58.97 </td>
240
- <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 52.51 </td>
241
- <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 35.98 </td>
242
- <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 72.48 </td>
243
- <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 80.51 </td>
244
- <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 75.68 </td>
245
- <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 65.8 </td>
246
- <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">87.47</td>
247
  </tr>
248
 
249
  <tr>
250
- <td style="text-align:left; background-color: #DAE8FF; color: black;">Llama-3.1-8B-Instruct</td>
251
- <td style="text-align:center; background-color: #DAE8FF; color: black;">36.43</td>
252
- <td style="text-align:center; background-color: #DAE8FF; color: black;">27.22</td>
253
- <td style="text-align:center; background-color: #DAE8FF; color: black;">69.15</td>
254
- <td style="text-align:center; background-color: #DAE8FF; color: black;">28.79</td>
255
- <td style="text-align:center; background-color: #DAE8FF; color: black;">52.79</td>
256
- <td style="text-align:center; background-color: #DAE8FF; color: black;">72.66</td>
257
- <td style="text-align:center; background-color: #DAE8FF; color: black;">61.48</td>
258
- <td style="text-align:center; background-color: #DAE8FF; color: black;">83.24</td>
259
- <td style="text-align:center; background-color: #DAE8FF; color: black;">85.32</td>
260
- <td style="text-align:center; background-color: #DAE8FF; color: black;">80.15</td>
261
- <td style="text-align:center; background-color: #DAE8FF; color: black;">79.10</td>
262
- <td style="text-align:center; background-color: #DAE8FF; color: black;">83.43</td>
263
  </tr>
264
 
265
  <tr>
266
- <td style="text-align:left; background-color: #DAE8FF; color: black;">DeepSeek-R1-Distill-Llama-8B</td>
267
- <td style="text-align:center; background-color: #DAE8FF; color: black;">17.17</td>
268
- <td style="text-align:center; background-color: #DAE8FF; color: black;">21.85</td>
269
- <td style="text-align:center; background-color: #DAE8FF; color: black;">45.80</td>
270
- <td style="text-align:center; background-color: #DAE8FF; color: black;">13.25</td>
271
- <td style="text-align:center; background-color: #DAE8FF; color: black;">47.43</td>
272
- <td style="text-align:center; background-color: #DAE8FF; color: black;">65.71</td>
273
- <td style="text-align:center; background-color: #DAE8FF; color: black;">44.46</td>
274
- <td style="text-align:center; background-color: #DAE8FF; color: black;">72.18</td>
275
- <td style="text-align:center; background-color: #DAE8FF; color: black;">67.54</td>
276
- <td style="text-align:center; background-color: #DAE8FF; color: black;">62.91</td>
277
- <td style="text-align:center; background-color: #DAE8FF; color: black;">66.50</td>
278
- <td style="text-align:center; background-color: #DAE8FF; color: black;">42.87</td>
279
  </tr>
280
 
281
  <tr>
282
- <td style="text-align:left; background-color: #DAE8FF; color: black;">Qwen-2.5-7B-Instruct</td>
283
- <td style="text-align:center; background-color: #DAE8FF; color: black;">25.44</td>
284
- <td style="text-align:center; background-color: #DAE8FF; color: black;">30.34</td>
285
- <td style="text-align:center; background-color: #DAE8FF; color: black;">74.30</td>
286
- <td style="text-align:center; background-color: #DAE8FF; color: black;">18.12</td>
287
- <td style="text-align:center; background-color: #DAE8FF; color: black;">63.06</td>
288
- <td style="text-align:center; background-color: #DAE8FF; color: black;">70.40</td>
289
- <td style="text-align:center; background-color: #DAE8FF; color: black;">54.71</td>
290
- <td style="text-align:center; background-color: #DAE8FF; color: black;">84.46</td>
291
- <td style="text-align:center; background-color: #DAE8FF; color: black;">93.35</td>
292
- <td style="text-align:center; background-color: #DAE8FF; color: black;">89.91</td>
293
- <td style="text-align:center; background-color: #DAE8FF; color: black;">74.90</td>
294
- <td style="text-align:center; background-color: #DAE8FF; color: black;">81.90</td>
295
  </tr>
296
 
297
  <tr>
298
- <td style="text-align:left; background-color: #DAE8FF; color: black;">DeepSeek-R1-Distill-Qwen-7B</td>
299
- <td style="text-align:center; background-color: #DAE8FF; color: black;">10.36</td>
300
- <td style="text-align:center; background-color: #DAE8FF; color: black;">15.35</td>
301
- <td style="text-align:center; background-color: #DAE8FF; color: black;">50.72</td>
302
- <td style="text-align:center; background-color: #DAE8FF; color: black;">9.94</td>
303
- <td style="text-align:center; background-color: #DAE8FF; color: black;">47.14</td>
304
- <td style="text-align:center; background-color: #DAE8FF; color: black;">65.04</td>
305
- <td style="text-align:center; background-color: #DAE8FF; color: black;">42.76</td>
306
- <td style="text-align:center; background-color: #DAE8FF; color: black;">78.47</td>
307
- <td style="text-align:center; background-color: #DAE8FF; color: black;">79.89</td>
308
- <td style="text-align:center; background-color: #DAE8FF; color: black;">78.43</td>
309
- <td style="text-align:center; background-color: #DAE8FF; color: black;">59.10</td>
310
- <td style="text-align:center; background-color: #DAE8FF; color: black;">42.45</td>
311
  </tr>
312
  <tr>
313
- <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-8B-Instruct</td>
314
- <td style="text-align:center; background-color: #DAE8FF; color: black;">37.58</td>
315
- <td style="text-align:center; background-color: #DAE8FF; color: black;">30.34</td>
316
- <td style="text-align:center; background-color: #DAE8FF; color: black;">66.77</td>
317
- <td style="text-align:center; background-color: #DAE8FF; color: black;">28.7</td>
318
- <td style="text-align:center; background-color: #DAE8FF; color: black;">65.84</td>
319
- <td style="text-align:center; background-color: #DAE8FF; color: black;">68.55</td>
320
- <td style="text-align:center; background-color: #DAE8FF; color: black;">50.78</td>
321
- <td style="text-align:center; background-color: #DAE8FF; color: black;">79.15</td>
322
- <td style="text-align:center; background-color: #DAE8FF; color: black;">89.63</td>
323
- <td style="text-align:center; background-color: #DAE8FF; color: black;">85.79</td>
324
- <td style="text-align:center; background-color: #DAE8FF; color: black;">73.20</td>
325
- <td style="text-align:center; background-color: #DAE8FF; color: black;">85.73</td>
326
  </tr>
327
 
328
  <tr>
329
- <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-8B-Instruct</td>
330
- <td style="text-align:center; background-color: #DAE8FF; color: black;">55.25</td>
331
- <td style="text-align:center; background-color: #DAE8FF; color: black;">61.19</td>
332
- <td style="text-align:center; background-color: #DAE8FF; color: black;">66.79</td>
333
- <td style="text-align:center; background-color: #DAE8FF; color: black;">28.04</td>
334
- <td style="text-align:center; background-color: #DAE8FF; color: black;">66.92</td>
335
- <td style="text-align:center; background-color: #DAE8FF; color: black;">64.77</td>
336
- <td style="text-align:center; background-color: #DAE8FF; color: black;">50.95</td>
337
- <td style="text-align:center; background-color: #DAE8FF; color: black;">81.65</td>
338
- <td style="text-align:center; background-color: #DAE8FF; color: black;">89.35</td>
339
- <td style="text-align:center; background-color: #DAE8FF; color: black;">85.72</td>
340
- <td style="text-align:center; background-color: #DAE8FF; color: black;">74.31</td>
341
- <td style="text-align:center; background-color: #DAE8FF; color: black;">84.7</td>
342
  </tr>
343
  <tr>
344
  <td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.3-8B-Instruct</b></td>
@@ -356,15 +356,13 @@ By redesigning a common household item like the plastic bottle, we can create a
356
  <td style="text-align:center; background-color: #DAE8FF; color: black;">88.5</td>
357
  </tr>
358
  </tbody></table>
359
-
360
-
361
  <table>
362
  <caption style="text-align:center"><b>Math Benchmarks</b></caption>
363
  <thead>
364
  <tr>
365
  <th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
366
  <th style="text-align:center; background-color: #001d6c; color: white;">AIME24</th>
367
- <th style="text-align:center; background-color: #001d6c; color: white;">MATH500</th>
368
  </tr></thead>
369
  <tbody>
370
  <tr>
@@ -378,19 +376,19 @@ By redesigning a common household item like the plastic bottle, we can create a
378
  <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 35.54 </td>
379
  </tr>
380
  <tr>
381
- <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;"><b>Granite-3.3-2B-Instruct</b></td>
382
- <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 3.28 </td>
383
- <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 58.09 </td>
384
  </tr>
385
  <tr>
386
- <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-8B-Instruct</td>
387
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 1.97 </td>
388
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 48.73 </td>
389
  </tr>
390
  <tr>
391
- <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-8B-Instruct</td>
392
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 2.43 </td>
393
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 52.8 </td>
394
  </tr>
395
  <tr>
396
  <td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.3-8B-Instruct</b></td>
@@ -398,7 +396,7 @@ By redesigning a common household item like the plastic bottle, we can create a
398
  <td style="text-align:center; background-color: #DAE8FF; color: black;"> 69.02 </td>
399
  </tr>
400
  </tbody></table>
401
-
402
  **Training Data:**
403
  Overall, our training data is largely comprised of two key sources: (1) publicly available datasets with permissive license, (2) internal synthetically generated data targeted to enhance reasoning capabilites.
404
  <!-- A detailed attribution of datasets can be found in [Granite 3.2 Technical Report (coming soon)](#), and [Accompanying Author List](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/author-ack.pdf). -->
@@ -415,7 +413,7 @@ Granite-3.3-8B-Instruct builds upon Granite-3.3-8B-Base, leveraging both permiss
415
  - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/
416
  - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
417
 
418
- <p><a href="#fnref1" title="Jump back to reference">[1]</a> Evaluated using <a href="https://github.com/allenai/olmes">OLMES</a> (except the AttaQ scores)</p>
419
  <!-- ## Citation
420
  ```
421
  @misc{granite-models,
 
183
  **Evaluation Results:**
184
  <table>
185
  <thead>
186
+ <caption style="text-align:center"><b>Comparison with different models over various benchmarks<sup id="fnref1"><a href="#fn1">1</a></sup>. Scores of AlpacaEval-2.0 and Arena-Hard are calculated with thinking=True</b></caption>
187
  <tr>
188
  <th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
189
+ <th style="text-align:center; background-color: #001d6c; color: white;">Arena-Hard</th>
190
  <th style="text-align:center; background-color: #001d6c; color: white;">AlpacaEval-2.0</th>
191
  <th style="text-align:center; background-color: #001d6c; color: white;">MMLU</th>
192
  <th style="text-align:center; background-color: #001d6c; color: white;">PopQA</th>
 
231
  <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">83.23</td>
232
  </tr>
233
  <tr>
234
+ <td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.3-2B-Instruct</b></td>
235
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 28.86 </td>
236
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 43.45 </td>
237
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 55.88 </td>
238
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 18.4 </td>
239
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 58.97 </td>
240
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 52.51 </td>
241
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 35.98 </td>
242
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 72.48 </td>
243
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 80.51 </td>
244
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 75.68 </td>
245
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 65.8 </td>
246
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">87.47</td>
247
  </tr>
248
 
249
  <tr>
250
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Llama-3.1-8B-Instruct</td>
251
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">36.43</td>
252
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">27.22</td>
253
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">69.15</td>
254
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">28.79</td>
255
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">52.79</td>
256
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">72.66</td>
257
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">61.48</td>
258
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">83.24</td>
259
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">85.32</td>
260
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">80.15</td>
261
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">79.10</td>
262
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">83.43</td>
263
  </tr>
264
 
265
  <tr>
266
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">DeepSeek-R1-Distill-Llama-8B</td>
267
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">17.17</td>
268
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">21.85</td>
269
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">45.80</td>
270
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">13.25</td>
271
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">47.43</td>
272
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">65.71</td>
273
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">44.46</td>
274
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">72.18</td>
275
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">67.54</td>
276
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">62.91</td>
277
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">66.50</td>
278
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">42.87</td>
279
  </tr>
280
 
281
  <tr>
282
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Qwen-2.5-7B-Instruct</td>
283
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">25.44</td>
284
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">30.34</td>
285
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">74.30</td>
286
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">18.12</td>
287
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">63.06</td>
288
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">70.40</td>
289
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">54.71</td>
290
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">84.46</td>
291
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">93.35</td>
292
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">89.91</td>
293
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">74.90</td>
294
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">81.90</td>
295
  </tr>
296
 
297
  <tr>
298
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">DeepSeek-R1-Distill-Qwen-7B</td>
299
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">10.36</td>
300
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">15.35</td>
301
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">50.72</td>
302
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">9.94</td>
303
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">47.14</td>
304
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">65.04</td>
305
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">42.76</td>
306
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">78.47</td>
307
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">79.89</td>
308
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">78.43</td>
309
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">59.10</td>
310
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">42.45</td>
311
  </tr>
312
  <tr>
313
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.1-8B-Instruct</td>
314
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">37.58</td>
315
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">30.34</td>
316
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">66.77</td>
317
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">28.7</td>
318
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">65.84</td>
319
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">68.55</td>
320
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">50.78</td>
321
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">79.15</td>
322
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">89.63</td>
323
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">85.79</td>
324
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">73.20</td>
325
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">85.73</td>
326
  </tr>
327
 
328
  <tr>
329
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.2-8B-Instruct</td>
330
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">55.25</td>
331
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">61.19</td>
332
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">66.79</td>
333
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">28.04</td>
334
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">66.92</td>
335
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">64.77</td>
336
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">50.95</td>
337
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">81.65</td>
338
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">89.35</td>
339
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">85.72</td>
340
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">74.31</td>
341
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">84.7</td>
342
  </tr>
343
  <tr>
344
  <td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.3-8B-Instruct</b></td>
 
356
  <td style="text-align:center; background-color: #DAE8FF; color: black;">88.5</td>
357
  </tr>
358
  </tbody></table>
 
 
359
  <table>
360
  <caption style="text-align:center"><b>Math Benchmarks</b></caption>
361
  <thead>
362
  <tr>
363
  <th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
364
  <th style="text-align:center; background-color: #001d6c; color: white;">AIME24</th>
365
+ <th style="text-align:center; background-color: #001d6c; color: white;">MATH-500</th>
366
  </tr></thead>
367
  <tbody>
368
  <tr>
 
376
  <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 35.54 </td>
377
  </tr>
378
  <tr>
379
+ <td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.3-2B-Instruct</b></td>
380
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 3.28 </td>
381
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 58.09 </td>
382
  </tr>
383
  <tr>
384
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.1-8B-Instruct</td>
385
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 1.97 </td>
386
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 48.73 </td>
387
  </tr>
388
  <tr>
389
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.2-8B-Instruct</td>
390
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 2.43 </td>
391
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 52.8 </td>
392
  </tr>
393
  <tr>
394
  <td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.3-8B-Instruct</b></td>
 
396
  <td style="text-align:center; background-color: #DAE8FF; color: black;"> 69.02 </td>
397
  </tr>
398
  </tbody></table>
399
+
400
  **Training Data:**
401
  Overall, our training data is largely comprised of two key sources: (1) publicly available datasets with permissive license, (2) internal synthetically generated data targeted to enhance reasoning capabilites.
402
  <!-- A detailed attribution of datasets can be found in [Granite 3.2 Technical Report (coming soon)](#), and [Accompanying Author List](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/author-ack.pdf). -->
 
413
  - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/
414
  - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
415
 
416
+ <p><a href="#fnref1" title="Jump back to reference">[1]</a> Evaluated using <a href="https://github.com/allenai/olmes">OLMES</a> (except AttaQ and Arena-Hard scores)</p>
417
  <!-- ## Citation
418
  ```
419
  @misc{granite-models,