RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions
Abstract
RefEdit, an instruction-based editing model trained on synthetic data, outperforms baselines in complex scene editing and referring expression tasks.
Despite recent advances in inversion and instruction-based image editing, existing approaches primarily excel at editing single, prominent objects but significantly struggle when applied to complex scenes containing multiple entities. To quantify this gap, we first introduce RefEdit-Bench, a rigorous real-world benchmark rooted in RefCOCO, where even baselines trained on millions of samples perform poorly. To overcome this limitation, we introduce RefEdit -- an instruction-based editing model trained on our scalable synthetic data generation pipeline. Our RefEdit, trained on only 20,000 editing triplets, outperforms the Flux/SD3 model-based baselines trained on millions of data. Extensive evaluations across various benchmarks demonstrate that our model not only excels in referring expression tasks but also enhances performance on traditional benchmarks, achieving state-of-the-art results comparable to closed-source methods. We release data \& checkpoint for reproducibility.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions (2025)
- Beyond Editing Pairs: Fine-Grained Instructional Image Editing via Multi-Scale Learnable Regions (2025)
- CompBench: Benchmarking Complex Instruction-guided Image Editing (2025)
- What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models (2025)
- SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing (2025)
- In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2025)
- POEM: Precise Object-level Editing via MLLM control (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper