qualcomm
/

Llama-v3.2-3B-Instruct

@@ -15,7 +15,7 @@ pipeline_tag: text-generation
 ## State-of-the-art large language model useful on a variety of language understanding and generation tasks
-Llama 3 is a family of LLMs. The "Chat" at the end indicates that the model is optimized for chatbot-like dialogue. The model is quantized to w4a16 (4-bit weights and 16-bit activations) and part of the model is quantized to w8a16 (8-bit weights and 16-bit activations) making it suitable for on-device deployment. For Prompt and output length specified below, the time to first token is Llama-PromptProcessor-Quantized's latency and average time per addition token is Llama-TokenGenerator-Quantized's latency.
 This model is an implementation of Llama-v3.2-3B-Instruct found [here](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct/).
@@ -50,7 +50,7 @@ This model is an implementation of Llama-v3.2-3B-Instruct found [here](https://h
 | Llama-v3.2-3B-Chat | w4a16 | Snapdragon X Elite CRD | Snapdragon® X Elite | QNN | 18.4176 | 0.12593600000000002 - 4.029952000000001 | -- | -- |
 | Llama-v3.2-3B-Chat | w4a16 | SA8255P ADP | Qualcomm® SA8255P | QNN | 14.02377 | 0.187414 - 5.997256999999999 | -- | -- |
-## Deploying Llama 3.2 on-device
 Please follow the [LLM on-device deployment](https://github.com/quic/ai-hub-apps/tree/main/tutorials/llm_on_genie) tutorial.

 ## State-of-the-art large language model useful on a variety of language understanding and generation tasks
+Llama 3 is a family of LLMs. The model is quantized to w4a16 (4-bit weights and 16-bit activations) and part of the model is quantized to w8a16 (8-bit weights and 16-bit activations) making it suitable for on-device deployment. For Prompt and output length specified below, the time to first token is Llama-PromptProcessor-Quantized's latency and average time per addition token is Llama-TokenGenerator-Quantized's latency.
 This model is an implementation of Llama-v3.2-3B-Instruct found [here](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct/).
 | Llama-v3.2-3B-Chat | w4a16 | Snapdragon X Elite CRD | Snapdragon® X Elite | QNN | 18.4176 | 0.12593600000000002 - 4.029952000000001 | -- | -- |
 | Llama-v3.2-3B-Chat | w4a16 | SA8255P ADP | Qualcomm® SA8255P | QNN | 14.02377 | 0.187414 - 5.997256999999999 | -- | -- |
+## Deploying Llama 3.2 3B on-device
 Please follow the [LLM on-device deployment](https://github.com/quic/ai-hub-apps/tree/main/tutorials/llm_on_genie) tutorial.