--- language: - en library_name: openvino pipeline_tag: text-generation base_model: tiiuae/Falcon3-7B-Instruct tags: - openvino - optimized - int4 - awq - falcon - falcon3 - instruction-tuned --- # Falcon3-7B-Instruct OpenVINO INT4 This repository contains the [tiiuae/Falcon3-7B-Instruct](https://huggingface.co/tiiuae/Falcon3-7B-Instruct) model optimized for inference with Intel's OpenVINO runtime. The model has been quantized to INT4 using the AWQ quantization scheme for improved performance while maintaining quality. ## Model Details * **Original Model**: [tiiuae/Falcon3-7B-Instruct](https://huggingface.co/tiiuae/Falcon3-7B-Instruct) * **Model Type**: Instruction-tuned Large Language Model * **Parameters**: 7B * **Quantization**: INT4 Symmetric AWQ (Activation-aware Weight Quantization) * **Group Size**: -1 (per-channel quantization) ## Optimization Details This model was converted from the original Hugging Face model to OpenVINO format using the Optimum Intel library. The following optimization command was used: ```bash optimum-cli export openvino \ -m tiiuae/Falcon3-7B-Instruct \ --weight-format int4 \ --sym \ --dataset auto \ --awq \ --group-size -1 \ falcon3-7b-instruct-int4-sym-ov ``` ## Usage ### Prerequisites - OpenVINO 2024.0 or newer - optimum-intel - transformers ### Sample Inference code with Optimum Intel ```python from optimum.intel import OVModelForCausalLM from transformers import AutoTokenizer # Load tokenizer and model model_id = "rpanchum/falcon3-7b-instruct-int4-sym-ov" tokenizer = AutoTokenizer.from_pretrained(model_id) model = OVModelForCausalLM.from_pretrained(model_id) # Generate text prompt = "Write a short story about a robot learning to paint:" input_ids = tokenizer(prompt, return_tensors="pt") output = model.generate( **input_ids, max_new_tokens=512, temperature=0.7, top_p=0.9, ) response = tokenizer.decode(output[0], skip_special_tokens=True) print(response) ``` ### Sample Inference code with Optimum Intel 1. Install packages required for using OpenVINO GenAI. ``` pip install openvino-genai huggingface_hub ``` 2. Download model and run inference. ``` import huggingface_hub as hf_hub model_id = "rpanchum/falcon3-7b-instruct-int4-sym-ov" model_path = "falcon3-7b-instruct-int4-sym-ov" hf_hub.snapshot_download(model_id, local_dir=model_path) import openvino_genai as ov_genai device = "CPU" pipe = ov_genai.LLMPipeline(model_path, device) print(pipe.generate("What is OpenVINO?", max_length=200)) ``` ## License This model inherits the license of the original [tiiuae/Falcon3-7B-Instruct](https://huggingface.co/tiiuae/Falcon3-7B-Instruct) model.