102 20 511

Victor Nogueira

Felladrin

https://felladrin.com

felladrin

AI & ML interests

Models to run in the web browser

Recent Activity

updated a model 1 day ago

onnx-community/Musical-genres-Classification-Hubert-V1-ONNX

published a model 1 day ago

onnx-community/Musical-genres-Classification-Hubert-V1-ONNX

reacted to eaddario's post with 🔥 1 day ago

Tensor-wise (TWQ) and Layer-wise quantization (LWQ) now available in llama.cpp! As of version b5125 users can now do TWQ, whereby you quantize a whole tensor at a specific level, or perform LWQ by choosing specific layers per tensor/s The new --tensor-type option enables llama-quantize to apply user-defined quant levels to any combination of allowed tensors (i.e. tensors with 2 or more dimensions) and layer number, with support for regex patterns. For example, to TWQ the Attention Value tensor you would use --tensor-type attn_v=q6_k and to perform LWQ you'll use something like --tensor-type "\.([0-9]|1[01257]|31)\.attn_v=q4_k" In the next few days/weeks I'll update the models in my HF repo (and will add some others) but https://huggingface.co/eaddario/DeepSeek-R1-Distill-Llama-8B-GGUF and https://huggingface.co/eaddario/DeepSeek-R1-Distill-Qwen-7B-GGUF have been already LWQed. For reference, compared to the naive Q4_K_M model, the LWQ Qwen-7B is almost 11% smaller (4.68GB vs 4.18GB) with only a 0.35% penalty on PPL! I'll update the https://medium.com/@eaddario/squeezing-tensor-bits-the-quest-for-smaller-llms-86b23bd052ca post to explain the process in detail, but in the meantime the following links will provide some background: - Changes to llama-quantize: https://github.com/ggml-org/llama.cpp/pull/12511 - TWQ & LWQ tests: https://github.com/ggml-org/llama.cpp/discussions/12741 - Modified llama-imatrix (not yet merged) used to generate imatrix statistics to guide the TWQ and LWQ process: https://github.com/ggml-org/llama.cpp/pull/12718

View all activity

Organizations

Felladrin's activity

New activity in onnx-community/convert-to-onnx 9 days ago

add-readme

#15 opened 10 days ago by

PierreMesure

New activity in onnx-community/convert-to-onnx 11 days ago

Upload to same repo

#14 opened 11 days ago by

PierreMesure

New activity in Felladrin/MiniSearch 18 days ago

On the previous detaulf model.

#1 opened 23 days ago by deleted

New activity in Felladrin/gguf-Q8_0-bge-reranker-v2-m3 19 days ago

Update model metadata to set pipeline tag to the new `text-ranking` and library name to `sentence-transformers`

#1 opened 19 days ago by

tomaarsen

New activity in Felladrin/gguf-jina-reranker-v1-tiny-en 19 days ago

Update model metadata to set pipeline tag to the new `text-ranking` and tags to `sentence-transformers`

#1 opened 19 days ago by

tomaarsen

New activity in Felladrin/onnx-Llama-160M-Chat-v1 24 days ago

How Did You Convert Felladrin/Llama-160M-Chat-v1 to ONNX Format?

#2 opened 24 days ago by

lakpriya

New activity in onnx-community/convert-to-onnx about 1 month ago

Update app.py

#11 opened about 1 month ago by

PierreMesure

Update transformers.js

#10 opened about 1 month ago by

PierreMesure

New activity in Felladrin/onnx-bloomz-560m-sft-chat about 2 months ago

Transformers.js - Enable external data format in Node.js

#1 opened about 2 months ago by

Xenova

New activity in Felladrin/onnx-gpt2-large-conversational-retrain about 2 months ago

Transformers.js - Enable external data format in Node.js

#2 opened about 2 months ago by

Xenova

New activity in mlx-community/mlx-my-repo about 2 months ago

Error when converting huihui-ai/Llama-3.2-3B-Instruct-abliterated: Received parameters not in model: lm_head.weight.

#36 opened about 2 months ago by

Felladrin

New activity in mlx-community/Phi-4-mini-instruct-8bit about 2 months ago

ValueError: [broadcast_shapes] Shapes (48) and (64) cannot be broadcast.

#1 opened about 2 months ago by

alexcardo

New activity in UUFO-Aigis/Pico-OpenLAiNN-250M about 2 months ago

What's its license?

#1 opened about 2 months ago by

Felladrin

New activity in Felladrin/gguf-sharded-LaMini-Flan-T5-248M about 2 months ago

how to run encoder-decoder models with llama.cpp

#1 opened 8 months ago by

FM-1976

New activity in onnx-community/convert-to-onnx about 2 months ago

update-transformer.js

#9 opened about 2 months ago by

PierreMesure

New activity in mlx-community/mlx-my-repo about 2 months ago

Error converting microsoft/Phi-4-mini-instruct: Shapes (48) and (64) cannot be broadcast.

#35 opened about 2 months ago by

Felladrin

New activity in Felladrin/llama2_xs_460M_experimental_platypus 2 months ago

Adding `safetensors` variant of this model

#1 opened 2 months ago by

SFconvertbot

New activity in onnx-community/mms-tts-kan-ONNX 3 months ago

Create README.md

#1 opened 3 months ago by

ravi4198

New activity in onnx-community/dictabert-ner-ONNX 3 months ago

Create README.md

#1 opened 3 months ago by

dingerstner

New activity in onnx-community/Qwen2.5-0.5B-ONNX 4 months ago

How is this different to onnx-community/Qwen2.5-0.5B?

#1 opened 4 months ago by

Xenova