v0.30.2
Browse filesSee https://github.com/quic/ai-hub-models/releases/v0.30.2 for changelog.
README.md
CHANGED
@@ -15,7 +15,7 @@ pipeline_tag: text-generation
|
|
15 |
## State-of-the-art large language model useful on a variety of language understanding and generation tasks
|
16 |
|
17 |
|
18 |
-
Llama 3 is a family of LLMs. The
|
19 |
|
20 |
This model is an implementation of Llama-v3.2-3B-Instruct found [here](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct/).
|
21 |
|
@@ -50,7 +50,7 @@ This model is an implementation of Llama-v3.2-3B-Instruct found [here](https://h
|
|
50 |
| Llama-v3.2-3B-Chat | w4a16 | Snapdragon X Elite CRD | Snapdragon® X Elite | QNN | 18.4176 | 0.12593600000000002 - 4.029952000000001 | -- | -- |
|
51 |
| Llama-v3.2-3B-Chat | w4a16 | SA8255P ADP | Qualcomm® SA8255P | QNN | 14.02377 | 0.187414 - 5.997256999999999 | -- | -- |
|
52 |
|
53 |
-
## Deploying Llama 3.2 on-device
|
54 |
|
55 |
Please follow the [LLM on-device deployment](https://github.com/quic/ai-hub-apps/tree/main/tutorials/llm_on_genie) tutorial.
|
56 |
|
|
|
15 |
## State-of-the-art large language model useful on a variety of language understanding and generation tasks
|
16 |
|
17 |
|
18 |
+
Llama 3 is a family of LLMs. The model is quantized to w4a16 (4-bit weights and 16-bit activations) and part of the model is quantized to w8a16 (8-bit weights and 16-bit activations) making it suitable for on-device deployment. For Prompt and output length specified below, the time to first token is Llama-PromptProcessor-Quantized's latency and average time per addition token is Llama-TokenGenerator-Quantized's latency.
|
19 |
|
20 |
This model is an implementation of Llama-v3.2-3B-Instruct found [here](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct/).
|
21 |
|
|
|
50 |
| Llama-v3.2-3B-Chat | w4a16 | Snapdragon X Elite CRD | Snapdragon® X Elite | QNN | 18.4176 | 0.12593600000000002 - 4.029952000000001 | -- | -- |
|
51 |
| Llama-v3.2-3B-Chat | w4a16 | SA8255P ADP | Qualcomm® SA8255P | QNN | 14.02377 | 0.187414 - 5.997256999999999 | -- | -- |
|
52 |
|
53 |
+
## Deploying Llama 3.2 3B on-device
|
54 |
|
55 |
Please follow the [LLM on-device deployment](https://github.com/quic/ai-hub-apps/tree/main/tutorials/llm_on_genie) tutorial.
|
56 |
|