V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models Paper ⢠2504.06148 ⢠Published 15 days ago ⢠13
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models Paper ⢠2503.20198 ⢠Published 28 days ago ⢠4
Automated Movie Generation via Multi-Agent CoT Planning Paper ⢠2503.07314 ⢠Published Mar 10 ⢠45
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles Paper ⢠2503.03651 ⢠Published Mar 5 ⢠16
Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models Paper ⢠2503.01774 ⢠Published Mar 3 ⢠44
PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data Paper ⢠2502.14397 ⢠Published Feb 20 ⢠42
WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation Paper ⢠2502.08047 ⢠Published Feb 12 ⢠27
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation Paper ⢠2502.07870 ⢠Published Feb 11 ⢠44