102 20 511

Victor Nogueira

Felladrin

https://felladrin.com

felladrin

AI & ML interests

Models to run in the web browser

Recent Activity

updated a model about 3 hours ago

onnx-community/Musical-genres-Classification-Hubert-V1-ONNX

published a model about 3 hours ago

onnx-community/Musical-genres-Classification-Hubert-V1-ONNX

reacted to eaddario's post with 🔥 about 3 hours ago

Tensor-wise (TWQ) and Layer-wise quantization (LWQ) now available in llama.cpp! As of version b5125 users can now do TWQ, whereby you quantize a whole tensor at a specific level, or perform LWQ by choosing specific layers per tensor/s The new --tensor-type option enables llama-quantize to apply user-defined quant levels to any combination of allowed tensors (i.e. tensors with 2 or more dimensions) and layer number, with support for regex patterns. For example, to TWQ the Attention Value tensor you would use --tensor-type attn_v=q6_k and to perform LWQ you'll use something like --tensor-type "\.([0-9]|1[01257]|31)\.attn_v=q4_k" In the next few days/weeks I'll update the models in my HF repo (and will add some others) but https://huggingface.co/eaddario/DeepSeek-R1-Distill-Llama-8B-GGUF and https://huggingface.co/eaddario/DeepSeek-R1-Distill-Qwen-7B-GGUF have been already LWQed. For reference, compared to the naive Q4_K_M model, the LWQ Qwen-7B is almost 11% smaller (4.68GB vs 4.18GB) with only a 0.35% penalty on PPL! I'll update the https://medium.com/@eaddario/squeezing-tensor-bits-the-quest-for-smaller-llms-86b23bd052ca post to explain the process in detail, but in the meantime the following links will provide some background: - Changes to llama-quantize: https://github.com/ggml-org/llama.cpp/pull/12511 - TWQ & LWQ tests: https://github.com/ggml-org/llama.cpp/discussions/12741 - Modified llama-imatrix (not yet merged) used to generate imatrix statistics to guide the TWQ and LWQ process: https://github.com/ggml-org/llama.cpp/pull/12718

View all activity

Organizations

Posts 2

Post

2020

I'm curating an AI-powered web search software timeline at https://github.com/felladrin/awesome-ai-web-search

The list covers three main categories:

1. Web Search with LLM summarization and follow-up capabilities
2. LLM chat interfaces with Web Search integration
3. Agent-driven research tools using LLM + Web Search

The timeline helps track the evolution of this space and serves as a reference for anyone looking for alternatives. If you know of any tools that should be included, please contribute by:
- opening a PR to edit the readme: https://github.com/felladrin/awesome-ai-web-search/edit/main/readme.md
- creating an issue in the repository: https://github.com/felladrin/awesome-ai-web-search/issues/new/choose
- or sharing in the comments below.

Post

3458

MiniSearch is celebrating its 1st birthday! 🎉

Exactly one year ago, I shared the initial version of this side-project on Hugging Face. Since then, there have been numerous changes under the hood. Nowadays it uses [Web-LLM](https://github.com/mlc-ai/web-llm), [Wllama](https://github.com/ngxson/wllama) and [SearXNG](https://github.com/searxng/searxng). I use it daily as my default search engine and have done my best to make it useful. I hope it's interesting for you too!

HF Space: Felladrin/MiniSearch
Embeddable URL: https://felladrin-minisearch.hf.space

View all Posts