InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 6 days ago • 228
🪐 SmolLM Collection A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated 20 days ago • 222
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 13 days ago • 163
Improved Visual-Spatial Reasoning via R1-Zero-Like Training Paper • 2504.00883 • Published 20 days ago • 60
SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement Paper • 2504.03561 • Published 16 days ago • 17
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse Paper • 2503.16365 • Published Mar 20 • 39
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning Paper • 2503.21620 • Published 24 days ago • 59
view article Article LeRobot goes to driving school: World’s largest open-source self-driving dataset Mar 11 • 76
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents Paper • 2406.10819 • Published Jun 16, 2024 • 1
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices Paper • 2406.08451 • Published Jun 12, 2024 • 26