Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Paper β’ 2504.12626 β’ Published 7 days ago β’ 46
An Empirical Study of GPT-4o Image Generation Capabilities Paper β’ 2504.05979 β’ Published 16 days ago β’ 61
An Empirical Study of GPT-4o Image Generation Capabilities Paper β’ 2504.05979 β’ Published 16 days ago β’ 61
ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation Paper β’ 2503.22194 β’ Published 27 days ago β’ 24
LEGION: Learning to Ground and Explain for Synthetic Image Detection Paper β’ 2503.15264 β’ Published Mar 19 β’ 21
CoRe^2: Collect, Reflect and Refine to Generate Better and Faster Paper β’ 2503.09662 β’ Published Mar 12 β’ 34
MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice Paper β’ 2503.05978 β’ Published Mar 7 β’ 35
LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation Paper β’ 2502.18302 β’ Published Feb 25 β’ 5
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks Paper β’ 2502.17157 β’ Published Feb 24 β’ 53
VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing Paper β’ 2502.17258 β’ Published Feb 24 β’ 79
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper β’ 2502.14786 β’ Published Feb 20 β’ 143
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency Paper β’ 2502.09621 β’ Published Feb 13 β’ 28
Magic 1-For-1: Generating One Minute Video Clips within One Minute Paper β’ 2502.07701 β’ Published Feb 11 β’ 36
ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features Paper β’ 2502.04320 β’ Published Feb 6 β’ 37
Generating Multi-Image Synthetic Data for Text-to-Image Customization Paper β’ 2502.01720 β’ Published Feb 3 β’ 8