Victor Nogueira's picture

Victor Nogueira

Felladrin

AI & ML interests

Models to run in the web browser

Recent Activity

reacted to eaddario's post with 🔥 about 3 hours ago
Tensor-wise (TWQ) and Layer-wise quantization (LWQ) now available in llama.cpp! As of version b5125 users can now do TWQ, whereby you quantize a whole tensor at a specific level, or perform LWQ by choosing specific layers per tensor/s The new --tensor-type option enables llama-quantize to apply user-defined quant levels to any combination of allowed tensors (i.e. tensors with 2 or more dimensions) and layer number, with support for regex patterns. For example, to TWQ the Attention Value tensor you would use --tensor-type attn_v=q6_k and to perform LWQ you'll use something like --tensor-type "\.([0-9]|1[01257]|31)\.attn_v=q4_k" In the next few days/weeks I'll update the models in my HF repo (and will add some others) but https://huggingface.co/eaddario/DeepSeek-R1-Distill-Llama-8B-GGUF and https://huggingface.co/eaddario/DeepSeek-R1-Distill-Qwen-7B-GGUF have been already LWQed. For reference, compared to the naive Q4_K_M model, the LWQ Qwen-7B is almost 11% smaller (4.68GB vs 4.18GB) with only a 0.35% penalty on PPL! I'll update the https://medium.com/@eaddario/squeezing-tensor-bits-the-quest-for-smaller-llms-86b23bd052ca post to explain the process in detail, but in the meantime the following links will provide some background: - Changes to llama-quantize: https://github.com/ggml-org/llama.cpp/pull/12511 - TWQ & LWQ tests: https://github.com/ggml-org/llama.cpp/discussions/12741 - Modified llama-imatrix (not yet merged) used to generate imatrix statistics to guide the TWQ and LWQ process: https://github.com/ggml-org/llama.cpp/pull/12718
View all activity

Organizations

Blog-explorers's profile picture MLX Community's profile picture Social Post Explorers's profile picture M4-ai's profile picture ONNX Community's profile picture Smol Community's profile picture

Posts 2

view post
Post
2020
I'm curating an AI-powered web search software timeline at https://github.com/felladrin/awesome-ai-web-search

The list covers three main categories:

1. Web Search with LLM summarization and follow-up capabilities
2. LLM chat interfaces with Web Search integration
3. Agent-driven research tools using LLM + Web Search

The timeline helps track the evolution of this space and serves as a reference for anyone looking for alternatives. If you know of any tools that should be included, please contribute by:
- opening a PR to edit the readme: https://github.com/felladrin/awesome-ai-web-search/edit/main/readme.md
- creating an issue in the repository: https://github.com/felladrin/awesome-ai-web-search/issues/new/choose
- or sharing in the comments below.
view post
Post
3458
MiniSearch is celebrating its 1st birthday! 🎉

Exactly one year ago, I shared the initial version of this side-project on Hugging Face. Since then, there have been numerous changes under the hood. Nowadays it uses [Web-LLM](https://github.com/mlc-ai/web-llm), [Wllama](https://github.com/ngxson/wllama) and [SearXNG](https://github.com/searxng/searxng). I use it daily as my default search engine and have done my best to make it useful. I hope it's interesting for you too!

HF Space: Felladrin/MiniSearch
Embeddable URL: https://felladrin-minisearch.hf.space