ZeroGUI: Automating Online GUI Learning at Zero Human Cost Paper • 2505.23762 • Published 5 days ago • 45
A Simple Aerial Detection Baseline of Multimodal Language Models Paper • 2501.09720 • Published Jan 16 • 2
Scalable Vision Language Model Training via High Quality Data Curation Paper • 2501.05952 • Published Jan 10 • 3
view article Article Preference Optimization for Vision Language Models By qgallouedec and 3 others • Jul 10, 2024 • 76
OmniCorpus 🐳 Collection [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text https://github.com/OpenGVLab/OmniCorpus • 5 items • Updated 21 days ago • 1