Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory. • 19 items • Updated 3 days ago • 18
Kimi-VL-A3B Collection Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 6 items • Updated 9 days ago • 61
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 14 days ago • 164
State of open code models (March 2025) Collection The best open code models on Hugging Face as of March 2025 • 7 items • Updated 27 days ago • 2
MoshiVis v0.1 Collection MoshiVis is a Vision Speech Model built as a perceptually-augmented version of Moshi v0.1 for conversing about image inputs • 8 items • Updated Mar 21 • 22
Training and Inference Efficiency of Encoder-Decoder Speech Models Paper • 2503.05931 • Published Mar 7 • 3
Cosmos Transfer1 Collection Multimodal Conditional World Generation for World2World Transfer • 5 items • Updated 7 days ago • 14
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM Mar 12 • 392