|
--- |
|
language: |
|
- en |
|
library_name: openvino |
|
pipeline_tag: text-generation |
|
base_model: tiiuae/Falcon3-7B-Instruct |
|
tags: |
|
- openvino |
|
- optimized |
|
- int4 |
|
- awq |
|
- falcon |
|
- falcon3 |
|
- instruction-tuned |
|
--- |
|
|
|
# Falcon3-7B-Instruct OpenVINO INT4 |
|
|
|
This repository contains the [tiiuae/Falcon3-7B-Instruct](https://huggingface.co/tiiuae/Falcon3-7B-Instruct) model optimized for inference with Intel's OpenVINO runtime. The model has been quantized to INT4 using the AWQ quantization scheme for improved performance while maintaining quality. |
|
|
|
## Model Details |
|
|
|
* **Original Model**: [tiiuae/Falcon3-7B-Instruct](https://huggingface.co/tiiuae/Falcon3-7B-Instruct) |
|
* **Model Type**: Instruction-tuned Large Language Model |
|
* **Parameters**: 7B |
|
* **Quantization**: INT4 Symmetric AWQ (Activation-aware Weight Quantization) |
|
* **Group Size**: -1 (per-channel quantization) |
|
|
|
## Optimization Details |
|
|
|
This model was converted from the original Hugging Face model to OpenVINO format using the Optimum Intel library. The following optimization command was used: |
|
|
|
```bash |
|
optimum-cli export openvino \ |
|
-m tiiuae/Falcon3-7B-Instruct \ |
|
--weight-format int4 \ |
|
--sym \ |
|
--dataset auto \ |
|
--awq \ |
|
--group-size -1 \ |
|
falcon3-7b-instruct-int4-sym-ov |
|
``` |
|
|
|
## Usage |
|
|
|
### Prerequisites |
|
- OpenVINO 2024.0 or newer |
|
- optimum-intel |
|
- transformers |
|
|
|
### Sample Inference code with Optimum Intel |
|
|
|
```python |
|
from optimum.intel import OVModelForCausalLM |
|
from transformers import AutoTokenizer |
|
|
|
# Load tokenizer and model |
|
model_id = "rpanchum/falcon3-7b-instruct-int4-sym-ov" |
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
model = OVModelForCausalLM.from_pretrained(model_id) |
|
|
|
# Generate text |
|
prompt = "Write a short story about a robot learning to paint:" |
|
input_ids = tokenizer(prompt, return_tensors="pt") |
|
output = model.generate( |
|
**input_ids, |
|
max_new_tokens=512, |
|
temperature=0.7, |
|
top_p=0.9, |
|
) |
|
response = tokenizer.decode(output[0], skip_special_tokens=True) |
|
print(response) |
|
``` |
|
|
|
### Sample Inference code with Optimum Intel |
|
|
|
1. Install packages required for using OpenVINO GenAI. |
|
``` |
|
pip install openvino-genai huggingface_hub |
|
``` |
|
|
|
2. Download model and run inference. |
|
|
|
``` |
|
import huggingface_hub as hf_hub |
|
|
|
model_id = "rpanchum/falcon3-7b-instruct-int4-sym-ov" |
|
model_path = "falcon3-7b-instruct-int4-sym-ov" |
|
|
|
hf_hub.snapshot_download(model_id, local_dir=model_path) |
|
|
|
import openvino_genai as ov_genai |
|
|
|
device = "CPU" |
|
pipe = ov_genai.LLMPipeline(model_path, device) |
|
print(pipe.generate("What is OpenVINO?", max_length=200)) |
|
|
|
``` |
|
|
|
|
|
## License |
|
|
|
This model inherits the license of the original [tiiuae/Falcon3-7B-Instruct](https://huggingface.co/tiiuae/Falcon3-7B-Instruct) model. |
|
|