RedHatAI
/

Sparse-Llama-3.1-8B-tldr-2of4-FP8-dynamic

Text Generation

text-generation-inference

compressed-tensors

Model card Files Files and versions

alexmarques commited on 10 days ago

Commit

7a4ffde

·

verified ·

1 Parent(s): d993353

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -89,13 +89,13 @@ model = AutoModelForCausalLM.from_pretrained(
     model_stub, torch_dtype="auto", device_map="auto"
 )
 output_dir = f"./{model_name}-FP8-dynamic"
 oneshot(
     model=model,
     recipe=recipe,
-    output_dir=output_dir,
-    tokenizer=AutoTokenizer.from_pretrained(model_stub),
 )
 model.save_pretrained(output_dir, save_compressed=True, skip_sparsity_compression_stats=False)
@@ -110,7 +110,7 @@ The model was evaluated on the test split of [trl-lib/tldr](https://huggingface.
 One can reproduce these results by using the following command:
 ```bash
-lm_eval --model vllm --model_args "pretrained=RedHatAI/Sparse-Llama-3.1-8B-tldr-2of4-FP8-dynamic,dtype=auto,add_bos_token" --batch-size auto --tasks tldr
 ```
 <table>

     model_stub, torch_dtype="auto", device_map="auto"
 )
+tokenizer = AutoTokenizer.from_pretrained(model_stub),
 output_dir = f"./{model_name}-FP8-dynamic"
 oneshot(
     model=model,
     recipe=recipe,
 )
 model.save_pretrained(output_dir, save_compressed=True, skip_sparsity_compression_stats=False)
 One can reproduce these results by using the following command:
 ```bash
+lm_eval --model vllm --model_args "pretrained=RedHatAI/Sparse-Llama-3.1-8B-tldr-2of4-FP8-dynamic,dtype=auto,add_bos_token=True" --batch-size auto --tasks tldr
 ```
 <table>