SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 15 days ago • 168
FastRTC Custom UIs Collection A collection of FastRTC demos that showcase how to built a Custom UI for your server • 4 items • Updated 15 days ago • 2
EchoLLaMA: 3D-to-Speech with Multimodal AI Collection This collection contains the models and datasets used in EchoLLaMA: 3D-to-Speech with Multimodal AI paper. • 4 items • Updated 15 days ago • 4
Llama 4 Collection Meta's new Llama 4 multimodal models, Scout & Maverick. Includes Dynamic GGUFs, 16-bit & Dynamic 4-bit uploads. Run & fine-tune them with Unsloth! • 15 items • Updated 1 day ago • 43
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search Paper • 2503.10582 • Published Mar 13 • 22
Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated 4 days ago • 157
SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos Paper • 2412.09401 • Published Dec 12, 2024 • 3
MambaVision Collection MambaVision: A Hybrid Mamba-Transformer Vision Backbone. Includes both 1K and 21K pretrained models. • 13 items • Updated 8 days ago • 31
EuroBERT Collection Scaling Multilingual Encoders for European Languages • 4 items • Updated Mar 10 • 11
Phi-4 Collection Phi-4 family of small language and multi-modal models. • 9 items • Updated 5 days ago • 117