RedHatAI
/

Sparse-Llama-3.1-8B-tldr-2of4-FP8-dynamic

Text Generation

text-generation-inference

compressed-tensors

Model card Files Files and versions

alexmarques commited on 7 days ago

Commit

1f1977d

·

verified ·

1 Parent(s): 7a4ffde

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -23,7 +23,7 @@ datasets:
 - **Model Developers:** Red Hat (Neural Magic)
 This model is a quantized version of [RedHatAI/Sparse-Llama-3.1-8B-tldr-2of4](https://huggingface.co/RedHatAI/Sparse-Llama-3.1-8B-tldr-2of4), which is fine-tuned on the [trl-lib/tldr](https://huggingface.co/datasets/trl-lib/tldr) dataset.
-This sparse-quantized model recovers 100% of the BERTScore (0.366) obtained by the dense model [RedHatAI/Llama-3.1-8B-tldr](https://huggingface.co/RedHatAI/Llama-3.1-8B-tldr).
 ## Deployment

 - **Model Developers:** Red Hat (Neural Magic)
 This model is a quantized version of [RedHatAI/Sparse-Llama-3.1-8B-tldr-2of4](https://huggingface.co/RedHatAI/Sparse-Llama-3.1-8B-tldr-2of4), which is fine-tuned on the [trl-lib/tldr](https://huggingface.co/datasets/trl-lib/tldr) dataset.
+This sparse-quantized model recovers 100% of the BERTScore (0.366) obtained by the dense model [RedHatAI/Llama-3.1-8B-tldr](https://huggingface.co/RedHatAI/Llama-3.1-8B-tldr) while providing up to 1.6x speedup.
 ## Deployment