helenai
/

jais-family-13b-ov-int4-sym

Model card Files Files and versions Community

jais-family-13b-ov-int4-sym / README.md

helenai's picture

Create README.md

e464e3b verified 15 days ago

|

history blame contribute delete

2.66 kB

	This is the [inceptionai/jais-family-13b](https://huggingface.co/inceptionai/jais-family-13b) model converted to [OpenVINO](https://docs.openvino.ai/2025/index.html)
	with INT4 weight compression.

	## Download the model

	- Install huggingface-hub

	```sh
	pip install huggingface-hub[cli]
	```

	- Download the model

	```sh
	huggingface-cli download helenai/jais-family-13b-ov-int4-sym --local-dir helenai/jais-family-13b-ov-int4-sym
	```

	## Run inference

	- Install/upgrade OpenVINO GenAI nightly (this is the only requirement for inference - no need to install transformers or PyTorch)

	```sh
	pip install --pre --upgrade openvino-genai openvino openvino-tokenizers --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release
	```

	- Download a sample inference script (curl -O works on Windows Command Prompt and most Linux terminals). Note that this is not a chat/instruct model; this inference script
	is only for testing model outputs. The model is not finetuned for answering questions, and the inference script will not remember history.

	```sh
	curl -O https://raw.githubusercontent.com/helena-intel/snippets/refs/heads/main/llm_chat/python/llm_test.py
	```

	- Run the script with the path to the model and the device as parameters. Change GPU to CPU to run on CPU. NPU is not yet supported for this model

	```sh
	python llm_test.py jais-family-13b-ov-int4-sym GPU
	```

	Check out [OpenVINO GenAI documentation](https://docs.openvino.ai/2025/openvino-workflow-generative/inference-with-genai.html) for more information.

	## Model compression parameters

	```
	openvino_version : 2025.2.0-18660-3ceeeb52d64

	advanced_parameters : {'statistics_path': None, 'awq_params': {'subset_size': 32, 'percent_to_apply': 0.002, 'alpha_min': 0.0, 'alpha_max': 1.0, 'steps': 100}, 'scale_estimation_params': {'subset_size': 64, 'initial_steps': 5, 'scale_steps': 5, 'weight_penalty': -1.0}, 'gptq_params': {'damp_percent': 0.1, 'block_size': 128, 'subset_size': 128}, 'lora_correction_params': {'adapter_rank': 8, 'num_iterations': 3, 'apply_regularization': True, 'subset_size': 128, 'use_int8_adapters': True}}
	all_layers : False
	awq : False
	backup_mode : int8_asym
	gptq : False
	group_size : -1
	ignored_scope : []
	lora_correction : False
	mode : int4_sym
	ratio : 1.0
	scale_estimation : False
	sensitivity_metric : weight_quantization_error

	optimum_intel_version : 1.22.0
	optimum_version : 1.24.0
	pytorch_version : 2.5.1+cpu
	transformers_version : 4.48.3
	```