J C's picture

J C

dark-pen

·

AI & ML interests

None yet

Recent Activity

liked a dataset 36 minutes ago

eaddario/imatrix-calibration

reacted to eaddario's post with 🔥 36 minutes ago

Tensor-wise (TWQ) and Layer-wise quantization (LWQ) now available in llama.cpp! As of version b5125 users can now do TWQ, whereby you quantize a whole tensor at a specific level, or perform LWQ by choosing specific layers per tensor/s The new --tensor-type option enables llama-quantize to apply user-defined quant levels to any combination of allowed tensors (i.e. tensors with 2 or more dimensions) and layer number, with support for regex patterns. For example, to TWQ the Attention Value tensor you would use --tensor-type attn_v=q6_k and to perform LWQ you'll use something like --tensor-type "\.([0-9]|1[01257]|31)\.attn_v=q4_k" In the next few days/weeks I'll update the models in my HF repo (and will add some others) but https://huggingface.co/eaddario/DeepSeek-R1-Distill-Llama-8B-GGUF and https://huggingface.co/eaddario/DeepSeek-R1-Distill-Qwen-7B-GGUF have been already LWQed. For reference, compared to the naive Q4_K_M model, the LWQ Qwen-7B is almost 11% smaller (4.68GB vs 4.18GB) with only a 0.35% penalty on PPL! I'll update the https://medium.com/@eaddario/squeezing-tensor-bits-the-quest-for-smaller-llms-86b23bd052ca post to explain the process in detail, but in the meantime the following links will provide some background: - Changes to llama-quantize: https://github.com/ggml-org/llama.cpp/pull/12511 - TWQ & LWQ tests: https://github.com/ggml-org/llama.cpp/discussions/12741 - Modified llama-imatrix (not yet merged) used to generate imatrix statistics to guide the TWQ and LWQ process: https://github.com/ggml-org/llama.cpp/pull/12718

liked a dataset about 12 hours ago

yzwang/X2I-in-context-learning

View all activity

Organizations

None yet

dark-pen's activity

upvoted a collection 9 days ago

Canary

A collection of multilingual and multitask speech to text models from NVIDIA NeMo 🐤 • 4 items • Updated 6 days ago • 21

upvoted an article 11 days ago

Article

Hugging Face and Cloudflare Partner to Make Real-Time Speech and Video Seamless with FastRTC

12 days ago

• 20

upvoted a collection 11 days ago

CoRNStack

State-of-the-art code retrieval and re-ranking models and datasets • 9 items • Updated 25 days ago • 17

upvoted a paper 11 days ago

YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation

Paper • 2407.04822 • Published Jul 5, 2024 • 4

upvoted a paper 13 days ago

Free Process Rewards without Process Labels

Paper • 2412.01981 • Published Dec 2, 2024 • 35

upvoted 2 collections 13 days ago

tevatron-v2-data

14 items • Updated 13 days ago • 3

Critique-out-Loud Reward Models

Paper: https://arxiv.org/abs/2408.11791 | Code: https://github.com/zankner/CLoud • 7 items • Updated Sep 5, 2024 • 4

upvoted 2 papers 18 days ago

EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation

Paper • 2410.21271 • Published Oct 28, 2024 • 7

MixLLM: Dynamic Routing in Mixed Large Language Models

Paper • 2502.18482 • Published Feb 9 • 1

upvoted a collection 19 days ago

L1

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning • 2 items • Updated Mar 7 • 5

upvoted 2 papers 19 days ago

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

Paper • 2503.04697 • Published Mar 6 • 3

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

Paper • 2503.21614 • Published 25 days ago • 39

upvoted a collection 19 days ago

Flux.1-dev ControlNets

A collection of ControlNet models for Flux.1-dev by Jasper Research • 4 items • Updated Sep 24, 2024 • 21

upvoted a paper 28 days ago

ITVTON:Virtual Try-On Diffusion Transformer Model Based on Integrated Image and Text

Paper • 2501.16757 • Published Jan 28 • 2

upvoted 2 collections 29 days ago

FuseO1-Preview

System-II Reasoning Fusion of LLMs • 11 items • Updated 13 days ago • 22

Qwen QwQ-32B Collection

Qwen's reasoning models including QwQ (32B) & QVQ (72B) in formats: GGUF, dynamic 4-bit and 16-bit original versions. • 13 items • Updated 4 days ago • 5

upvoted 3 papers 29 days ago

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

Paper • 2503.16418 • Published Mar 20 • 35

DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers

Paper • 2503.14487 • Published Mar 18 • 27

Unleashing Vecset Diffusion Model for Fast Shape Generation

Paper • 2503.16302 • Published Mar 20 • 44