helenai's picture
Create README.md
e464e3b verified

This is the inceptionai/jais-family-13b model converted to OpenVINO with INT4 weight compression.

Download the model

  • Install huggingface-hub
pip install huggingface-hub[cli]
  • Download the model
huggingface-cli download helenai/jais-family-13b-ov-int4-sym --local-dir helenai/jais-family-13b-ov-int4-sym

Run inference

  • Install/upgrade OpenVINO GenAI nightly (this is the only requirement for inference - no need to install transformers or PyTorch)
pip install --pre --upgrade openvino-genai openvino openvino-tokenizers --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release
  • Download a sample inference script (curl -O works on Windows Command Prompt and most Linux terminals). Note that this is not a chat/instruct model; this inference script is only for testing model outputs. The model is not finetuned for answering questions, and the inference script will not remember history.
curl -O https://raw.githubusercontent.com/helena-intel/snippets/refs/heads/main/llm_chat/python/llm_test.py
  • Run the script with the path to the model and the device as parameters. Change GPU to CPU to run on CPU. NPU is not yet supported for this model
python llm_test.py jais-family-13b-ov-int4-sym GPU

Check out OpenVINO GenAI documentation for more information.

Model compression parameters

openvino_version         : 2025.2.0-18660-3ceeeb52d64

advanced_parameters      : {'statistics_path': None, 'awq_params': {'subset_size': 32, 'percent_to_apply': 0.002, 'alpha_min': 0.0, 'alpha_max': 1.0, 'steps': 100}, 'scale_estimation_params': {'subset_size': 64, 'initial_steps': 5, 'scale_steps': 5, 'weight_penalty': -1.0}, 'gptq_params': {'damp_percent': 0.1, 'block_size': 128, 'subset_size': 128}, 'lora_correction_params': {'adapter_rank': 8, 'num_iterations': 3, 'apply_regularization': True, 'subset_size': 128, 'use_int8_adapters': True}}
all_layers               : False
awq                      : False
backup_mode              : int8_asym
gptq                     : False
group_size               : -1
ignored_scope            : []
lora_correction          : False
mode                     : int4_sym
ratio                    : 1.0
scale_estimation         : False
sensitivity_metric       : weight_quantization_error

optimum_intel_version    : 1.22.0
optimum_version          : 1.24.0
pytorch_version          : 2.5.1+cpu
transformers_version     : 4.48.3