YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Official AQLM quantization of meta-llama/Llama-2-70b-hf.

For this quantization, we used 1 codebook of 16 bits.

Selected evaluation results for this and other models:

Model AQLM scheme WikiText 2 PPL Model size, Gb Hub link
Llama-2-7b 1x16 5.92 2.4 Link
Llama-2-7b 2x8 6.69 2.2 Link
Llama-2-7b 8x8 6.61 2.2 Link
Llama-2-13b 1x16 5.22 4.1 Link
Llama-2-70b (THIS) 1x16 3.83 18.8 Link
Llama-2-70b 2x8 4.21 18.2 Link
Mixtral-8x7b 1x16 3.35 12.6 Link
Mixtral-8x7b-Instruct 1x16 - 12.6 Link

To learn more about the inference, as well as the information on how to quantize models yourself, please refer to the official GitHub repo.

Downloads last month
8
Safetensors
Model size
9.38B params
Tensor type
FP16
·
I16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ISTA-DASLab/Llama-2-70b-AQLM-2Bit-1x16-hf

Adapters
1 model

Collection including ISTA-DASLab/Llama-2-70b-AQLM-2Bit-1x16-hf