SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement Paper • 2504.07934 • Published 14 days ago • 18
V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models Paper • 2504.06148 • Published 16 days ago • 13
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models Paper • 2503.20198 • Published 30 days ago • 4
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Paper • 2501.12895 • Published Jan 22 • 61
Negative Token Merging: Image-based Adversarial Feature Guidance Paper • 2412.01339 • Published Dec 2, 2024 • 23