Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy Paper • 2503.19757 • Published 27 days ago • 50
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks Paper • 2503.21696 • Published 25 days ago • 22
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published Mar 18 • 119
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search Paper • 2503.10582 • Published Mar 13 • 22
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding Paper • 2503.02951 • Published Mar 4 • 29
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 142
LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization Paper • 2502.13922 • Published Feb 19 • 26
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking Paper • 2502.02339 • Published Feb 4 • 22
Ovis2 Collection Our latest advancement in multi-modal large language models (MLLMs) • 15 items • Updated 27 days ago • 59
VideoLLaMA3 Collection Frontier Multimodal Foundation Models for Video Understanding • 14 items • Updated Mar 11 • 14
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published Jan 22 • 91
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper • 2501.12380 • Published Jan 21 • 86
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models Paper • 2501.03262 • Published Jan 4 • 99
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published Jan 1 • 107
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper • 2501.00599 • Published Dec 31, 2024 • 48
PixMo Collection A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog • 10 items • Updated Mar 13 • 68
Inf-CL Collection The corresponding demos/checkpoints/papers/datasets of Inf-CL. • 2 items • Updated Mar 11 • 3