File size: 2,655 Bytes
e464e3b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
This is the [inceptionai/jais-family-13b](https://huggingface.co/inceptionai/jais-family-13b) model converted to [OpenVINO](https://docs.openvino.ai/2025/index.html) 
with INT4 weight compression.

## Download the model

- Install huggingface-hub

```sh
pip install huggingface-hub[cli]
```

- Download the model

```sh
huggingface-cli download helenai/jais-family-13b-ov-int4-sym --local-dir helenai/jais-family-13b-ov-int4-sym
```

## Run inference

- Install/upgrade OpenVINO GenAI nightly (this is the only requirement for inference - no need to install transformers or PyTorch)

```sh
pip install --pre --upgrade openvino-genai openvino openvino-tokenizers --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release
```

- Download a sample inference script (curl -O works on Windows Command Prompt and most Linux terminals). Note that this is not a chat/instruct model; this inference script 
  is only for testing model outputs. The model is not finetuned for answering questions, and the inference script will not remember history.

```sh
curl -O https://raw.githubusercontent.com/helena-intel/snippets/refs/heads/main/llm_chat/python/llm_test.py
```

- Run the script with the path to the model and the device as parameters. Change GPU to CPU to run on CPU. NPU is not yet supported for this model

```sh
python llm_test.py jais-family-13b-ov-int4-sym GPU
```

Check out [OpenVINO GenAI documentation](https://docs.openvino.ai/2025/openvino-workflow-generative/inference-with-genai.html) for more information.

## Model compression parameters

```
openvino_version         : 2025.2.0-18660-3ceeeb52d64

advanced_parameters      : {'statistics_path': None, 'awq_params': {'subset_size': 32, 'percent_to_apply': 0.002, 'alpha_min': 0.0, 'alpha_max': 1.0, 'steps': 100}, 'scale_estimation_params': {'subset_size': 64, 'initial_steps': 5, 'scale_steps': 5, 'weight_penalty': -1.0}, 'gptq_params': {'damp_percent': 0.1, 'block_size': 128, 'subset_size': 128}, 'lora_correction_params': {'adapter_rank': 8, 'num_iterations': 3, 'apply_regularization': True, 'subset_size': 128, 'use_int8_adapters': True}}
all_layers               : False
awq                      : False
backup_mode              : int8_asym
gptq                     : False
group_size               : -1
ignored_scope            : []
lora_correction          : False
mode                     : int4_sym
ratio                    : 1.0
scale_estimation         : False
sensitivity_metric       : weight_quantization_error

optimum_intel_version    : 1.22.0
optimum_version          : 1.24.0
pytorch_version          : 2.5.1+cpu
transformers_version     : 4.48.3
```