alexmarques commited on
Commit
237a04b
·
verified ·
1 Parent(s): 66616db

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -3
README.md CHANGED
@@ -27,13 +27,17 @@ language:
27
  - **Model Developers:** Neural Magic
28
 
29
  This is a code completion AI model obtained by fine-tuning the 2:4 sparse [Sparse-Llama-3.1-8B-2of4](https://huggingface.co/neuralmagic/Sparse-Llama-3.1-8B-2of4) on the [evol-codealpaca-v1](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1) dataset, followed by quantization
30
- On the [HumanEval](https://arxiv.org/abs/2107.03374) benchmark, it achieves a pass@1 of 49.1, compared to 48.5 for the fine-tuned dense model [Llama-3.1-8B-evolcodealpaca](https://huggingface.co/neuralmagic/Llama-3.1-8B-evolcodealpaca) — demonstrating over **100% accuracy recovery**.
31
 
32
 
33
  ### Model Optimizations
34
 
35
- This inherits the optimizations from its parent, [Sparse-Llama-3.1-8B-2of4](https://huggingface.co/neuralmagic/Sparse-Llama-3.1-8B-2of4).
36
- Namely, all linear operators within transformer blocks were pruned to the 2:4 sparsity pattern: in each group of four weights, two are retained while two are pruned.
 
 
 
 
37
 
38
 
39
  ## Deployment with vLLM
@@ -52,15 +56,18 @@ This model was evaluated on Neural Magic's fork of [EvalPlus](https://github.com
52
  <td><strong>Metric</strong></td>
53
  <td style="text-align: center"><strong>Llama-3.1-8B-evolcodealpaca</strong></td>
54
  <td style="text-align: center"><strong>Sparse-Llama-3.1-8B-evolcodealpaca-2of4</strong></td>
 
55
  </tr>
56
  <tr>
57
  <td>HumanEval pass@1</td>
58
  <td style="text-align: center">48.5</td>
59
  <td style="text-align: center">49.1</td>
 
60
  </tr>
61
  <tr>
62
  <td>HumanEval+ pass@1</td>
63
  <td style="text-align: center">44.2</td>
64
  <td style="text-align: center">46.3</td>
 
65
  </tr>
66
  </table>
 
27
  - **Model Developers:** Neural Magic
28
 
29
  This is a code completion AI model obtained by fine-tuning the 2:4 sparse [Sparse-Llama-3.1-8B-2of4](https://huggingface.co/neuralmagic/Sparse-Llama-3.1-8B-2of4) on the [evol-codealpaca-v1](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1) dataset, followed by quantization
30
+ On the [HumanEval](https://arxiv.org/abs/2107.03374) benchmark, it achieves a pass@1 of 50.6, compared to 48.5 for the fine-tuned dense model [Llama-3.1-8B-evolcodealpaca](https://huggingface.co/neuralmagic/Llama-3.1-8B-evolcodealpaca) — demonstrating over **100% accuracy recovery**.
31
 
32
 
33
  ### Model Optimizations
34
 
35
+ This model was obtained by quantizing the weights of [Sparse-Llama-3.1-8B-evolcodealpaca-2of4](https://huggingface.co/neuralmagic/Sparse-Llama-3.1-8B-evolcodealpaca-2of4) to INT4 data type.
36
+ This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 75%.
37
+ That is on top of the reduction of 50% of weights via 2:4 pruning employed on [Sparse-Llama-3.1-8B-evolcodealpaca-2of4](https://huggingface.co/neuralmagic/Sparse-Llama-3.1-8B-evolcodealpaca-2of4).
38
+
39
+ Only the weights of the linear operators within transformers blocks are quantized. Symmetric per-channel quantization is applied, in which a linear scaling per output dimension maps the INT4 and floating point representations of the quantized weights.
40
+ The [GPTQ](https://arxiv.org/abs/2210.17323) algorithm is applied for quantization, as implemented in the [llm-compressor](https://github.com/vllm-project/llm-compressor) library.
41
 
42
 
43
  ## Deployment with vLLM
 
56
  <td><strong>Metric</strong></td>
57
  <td style="text-align: center"><strong>Llama-3.1-8B-evolcodealpaca</strong></td>
58
  <td style="text-align: center"><strong>Sparse-Llama-3.1-8B-evolcodealpaca-2of4</strong></td>
59
+ <td style="text-align: center"><strong>Sparse-Llama-3.1-8B-evolcodealpaca-2of4-quantized.w4a16</strong></td>
60
  </tr>
61
  <tr>
62
  <td>HumanEval pass@1</td>
63
  <td style="text-align: center">48.5</td>
64
  <td style="text-align: center">49.1</td>
65
+ <td style="text-align: center">50.6</td>
66
  </tr>
67
  <tr>
68
  <td>HumanEval+ pass@1</td>
69
  <td style="text-align: center">44.2</td>
70
  <td style="text-align: center">46.3</td>
71
+ <td style="text-align: center">48.0</td>
72
  </tr>
73
  </table>