File size: 2,655 Bytes
e464e3b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
This is the [inceptionai/jais-family-13b](https://huggingface.co/inceptionai/jais-family-13b) model converted to [OpenVINO](https://docs.openvino.ai/2025/index.html)
with INT4 weight compression.
## Download the model
- Install huggingface-hub
```sh
pip install huggingface-hub[cli]
```
- Download the model
```sh
huggingface-cli download helenai/jais-family-13b-ov-int4-sym --local-dir helenai/jais-family-13b-ov-int4-sym
```
## Run inference
- Install/upgrade OpenVINO GenAI nightly (this is the only requirement for inference - no need to install transformers or PyTorch)
```sh
pip install --pre --upgrade openvino-genai openvino openvino-tokenizers --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release
```
- Download a sample inference script (curl -O works on Windows Command Prompt and most Linux terminals). Note that this is not a chat/instruct model; this inference script
is only for testing model outputs. The model is not finetuned for answering questions, and the inference script will not remember history.
```sh
curl -O https://raw.githubusercontent.com/helena-intel/snippets/refs/heads/main/llm_chat/python/llm_test.py
```
- Run the script with the path to the model and the device as parameters. Change GPU to CPU to run on CPU. NPU is not yet supported for this model
```sh
python llm_test.py jais-family-13b-ov-int4-sym GPU
```
Check out [OpenVINO GenAI documentation](https://docs.openvino.ai/2025/openvino-workflow-generative/inference-with-genai.html) for more information.
## Model compression parameters
```
openvino_version : 2025.2.0-18660-3ceeeb52d64
advanced_parameters : {'statistics_path': None, 'awq_params': {'subset_size': 32, 'percent_to_apply': 0.002, 'alpha_min': 0.0, 'alpha_max': 1.0, 'steps': 100}, 'scale_estimation_params': {'subset_size': 64, 'initial_steps': 5, 'scale_steps': 5, 'weight_penalty': -1.0}, 'gptq_params': {'damp_percent': 0.1, 'block_size': 128, 'subset_size': 128}, 'lora_correction_params': {'adapter_rank': 8, 'num_iterations': 3, 'apply_regularization': True, 'subset_size': 128, 'use_int8_adapters': True}}
all_layers : False
awq : False
backup_mode : int8_asym
gptq : False
group_size : -1
ignored_scope : []
lora_correction : False
mode : int4_sym
ratio : 1.0
scale_estimation : False
sensitivity_metric : weight_quantization_error
optimum_intel_version : 1.22.0
optimum_version : 1.24.0
pytorch_version : 2.5.1+cpu
transformers_version : 4.48.3
``` |