rpanchum's picture
Update README.md
15bf773 verified
---
language:
- en
library_name: openvino
pipeline_tag: text-generation
base_model: tiiuae/Falcon3-7B-Instruct
tags:
- openvino
- optimized
- int4
- awq
- falcon
- falcon3
- instruction-tuned
---
# Falcon3-7B-Instruct OpenVINO INT4
This repository contains the [tiiuae/Falcon3-7B-Instruct](https://huggingface.co/tiiuae/Falcon3-7B-Instruct) model optimized for inference with Intel's OpenVINO runtime. The model has been quantized to INT4 using the AWQ quantization scheme for improved performance while maintaining quality.
## Model Details
* **Original Model**: [tiiuae/Falcon3-7B-Instruct](https://huggingface.co/tiiuae/Falcon3-7B-Instruct)
* **Model Type**: Instruction-tuned Large Language Model
* **Parameters**: 7B
* **Quantization**: INT4 Symmetric AWQ (Activation-aware Weight Quantization)
* **Group Size**: -1 (per-channel quantization)
## Optimization Details
This model was converted from the original Hugging Face model to OpenVINO format using the Optimum Intel library. The following optimization command was used:
```bash
optimum-cli export openvino \
-m tiiuae/Falcon3-7B-Instruct \
--weight-format int4 \
--sym \
--dataset auto \
--awq \
--group-size -1 \
falcon3-7b-instruct-int4-sym-ov
```
## Usage
### Prerequisites
- OpenVINO 2024.0 or newer
- optimum-intel
- transformers
### Sample Inference code with Optimum Intel
```python
from optimum.intel import OVModelForCausalLM
from transformers import AutoTokenizer
# Load tokenizer and model
model_id = "rpanchum/falcon3-7b-instruct-int4-sym-ov"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = OVModelForCausalLM.from_pretrained(model_id)
# Generate text
prompt = "Write a short story about a robot learning to paint:"
input_ids = tokenizer(prompt, return_tensors="pt")
output = model.generate(
**input_ids,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
)
response = tokenizer.decode(output[0], skip_special_tokens=True)
print(response)
```
### Sample Inference code with Optimum Intel
1. Install packages required for using OpenVINO GenAI.
```
pip install openvino-genai huggingface_hub
```
2. Download model and run inference.
```
import huggingface_hub as hf_hub
model_id = "rpanchum/falcon3-7b-instruct-int4-sym-ov"
model_path = "falcon3-7b-instruct-int4-sym-ov"
hf_hub.snapshot_download(model_id, local_dir=model_path)
import openvino_genai as ov_genai
device = "CPU"
pipe = ov_genai.LLMPipeline(model_path, device)
print(pipe.generate("What is OpenVINO?", max_length=200))
```
## License
This model inherits the license of the original [tiiuae/Falcon3-7B-Instruct](https://huggingface.co/tiiuae/Falcon3-7B-Instruct) model.