ObjectMover: Generative Object Movement with Video Prior
Abstract
Simple as it seems, moving an object to another location within an image is, in fact, a challenging image-editing task that requires re-harmonizing the lighting, adjusting the pose based on perspective, accurately filling occluded regions, and ensuring coherent synchronization of shadows and reflections while maintaining the object identity. In this paper, we present ObjectMover, a generative model that can perform object movement in highly challenging scenes. Our key insight is that we model this task as a sequence-to-sequence problem and fine-tune a video generation model to leverage its knowledge of consistent object generation across video frames. We show that with this approach, our model is able to adjust to complex real-world scenarios, handling extreme lighting harmonization and object effect movement. As large-scale data for object movement are unavailable, we construct a data generation pipeline using a modern game engine to synthesize high-quality data pairs. We further propose a multi-task learning strategy that enables training on real-world video data to improve the model generalization. Through extensive experiments, we demonstrate that ObjectMover achieves outstanding results and adapts well to real-world scenarios.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- OmniEraser: Remove Objects and Their Effects in Images with Paired Video-Frame Data (2025)
- OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting (2025)
- Get In Video: Add Anything You Want to the Video (2025)
- 3D Object Manipulation in a Single Image using Generative Models (2025)
- HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation (2025)
- VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors (2025)
- VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper