InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 6 days ago • 228
Kimina Prover Preview Collection State-of-the-Art Models for Formal Mathematical Reasoning • 4 items • Updated 7 days ago • 26
Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs Paper • 2504.07866 • Published 10 days ago • 8
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper • 2504.08685 • Published 9 days ago • 119
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis Paper • 2504.04842 • Published 14 days ago • 31
Kimi-VL-A3B Collection Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 6 items • Updated 9 days ago • 61
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published 12 days ago • 144
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values Paper • 2504.05535 • Published 13 days ago • 43
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 13 days ago • 163
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme Paper • 2504.02587 • Published 18 days ago • 30
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources Paper • 2504.00595 • Published 20 days ago • 34
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published 20 days ago • 74
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages Paper • 2503.20212 • Published 26 days ago • 5