Victor Nogueira's picture

Victor Nogueira

Felladrin

AI & ML interests

Models to run in the web browser

Recent Activity

reacted to eaddario's post with 🔥 1 day ago
Tensor-wise (TWQ) and Layer-wise quantization (LWQ) now available in llama.cpp! As of version b5125 users can now do TWQ, whereby you quantize a whole tensor at a specific level, or perform LWQ by choosing specific layers per tensor/s The new --tensor-type option enables llama-quantize to apply user-defined quant levels to any combination of allowed tensors (i.e. tensors with 2 or more dimensions) and layer number, with support for regex patterns. For example, to TWQ the Attention Value tensor you would use --tensor-type attn_v=q6_k and to perform LWQ you'll use something like --tensor-type "\.([0-9]|1[01257]|31)\.attn_v=q4_k" In the next few days/weeks I'll update the models in my HF repo (and will add some others) but https://huggingface.co/eaddario/DeepSeek-R1-Distill-Llama-8B-GGUF and https://huggingface.co/eaddario/DeepSeek-R1-Distill-Qwen-7B-GGUF have been already LWQed. For reference, compared to the naive Q4_K_M model, the LWQ Qwen-7B is almost 11% smaller (4.68GB vs 4.18GB) with only a 0.35% penalty on PPL! I'll update the https://medium.com/@eaddario/squeezing-tensor-bits-the-quest-for-smaller-llms-86b23bd052ca post to explain the process in detail, but in the meantime the following links will provide some background: - Changes to llama-quantize: https://github.com/ggml-org/llama.cpp/pull/12511 - TWQ & LWQ tests: https://github.com/ggml-org/llama.cpp/discussions/12741 - Modified llama-imatrix (not yet merged) used to generate imatrix statistics to guide the TWQ and LWQ process: https://github.com/ggml-org/llama.cpp/pull/12718
View all activity

Organizations

Blog-explorers's profile picture MLX Community's profile picture Social Post Explorers's profile picture M4-ai's profile picture ONNX Community's profile picture Smol Community's profile picture

Felladrin's activity

New activity in onnx-community/convert-to-onnx 9 days ago

add-readme

2
#15 opened 10 days ago by
PierreMesure
New activity in onnx-community/convert-to-onnx 11 days ago

Upload to same repo

#14 opened 11 days ago by
PierreMesure
New activity in Felladrin/MiniSearch 18 days ago

On the previous detaulf model.

3
#1 opened 23 days ago by deleted
New activity in onnx-community/convert-to-onnx about 1 month ago

Update app.py

2
#11 opened about 1 month ago by
PierreMesure

Update transformers.js

#10 opened about 1 month ago by
PierreMesure
New activity in UUFO-Aigis/Pico-OpenLAiNN-250M about 2 months ago

What's its license?

6
#1 opened about 2 months ago by
Felladrin
New activity in onnx-community/convert-to-onnx about 2 months ago

update-transformer.js

3
#9 opened about 2 months ago by
PierreMesure
New activity in onnx-community/mms-tts-kan-ONNX 3 months ago

Create README.md

#1 opened 3 months ago by
ravi4198
New activity in onnx-community/dictabert-ner-ONNX 3 months ago

Create README.md

#1 opened 3 months ago by
dingerstner