How to run int4?

#19

by BootsofLagrangian - opened 15 days ago

Discussion

BootsofLagrangian

15 days ago

I saw this paragraph in README.md:

We provide code for on-the-fly int4 quantization which minimizes performance degradation as well.

However, I couldn't find any code or script related to this.

Does just loading the model with a bnb config automatically enable int4? Or is there something else I need to do?

Peter-233234

8 days ago

•

edited 8 days ago

Transformers comes with BitsAndBytes. When you load the model using .from_pretrained(...,quantisation_config=BitsAndBytes(...)) - just pass a BitsAndBytes configuration object for int8 or int4 with the specific arguments. You can additionally off-load some of the layers to free-up room in VRAM. Alternatively, you can use a config_map to assign layers. I'd use BitsAndBytes and set device_map="auto" and let transformers set to device_mapping for the layers.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment