Boost Your Own Human Image Generation Model via Direct Preference Optimization with AI Feedback
Abstract
The generation of high-quality human images through text-to-image (T2I) methods is a significant yet challenging task. Distinct from general image generation, human image synthesis must satisfy stringent criteria related to human pose, anatomy, and alignment with textual prompts, making it particularly difficult to achieve realistic results. Recent advancements in T2I generation based on diffusion models have shown promise, yet challenges remain in meeting human-specific preferences. In this paper, we introduce a novel approach tailored specifically for human image generation utilizing Direct Preference Optimization (DPO). Specifically, we introduce an efficient method for constructing a specialized DPO dataset for training human image generation models without the need for costly human feedback. We also propose a modified loss function that enhances the DPO training process by minimizing artifacts and improving image fidelity. Our method demonstrates its versatility and effectiveness in generating human images, including personalized text-to-image generation. Through comprehensive evaluations, we show that our approach significantly advances the state of human image generation, achieving superior results in terms of natural anatomies, poses, and text-image alignment.
Community
This paper proposes an enhanced Direct Preference Optimization method (HG-DPO) that incorporates high-quality real images as winning images and uses a curriculum learning framework to improve the realism and personalization of human image generation.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MGHanD: Multi-modal Guidance for authentic Hand Diffusion (2025)
- Unified Reward Model for Multimodal Understanding and Generation (2025)
- Text-driven 3D Human Generation via Contrastive Preference Optimization (2025)
- Single Image Iterative Subject-driven Generation and Editing (2025)
- Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation (2025)
- IPGO: Indirect Prompt Gradient Optimization on Text-to-Image Generative Models with High Data Efficiency (2025)
- Towards More Accurate Personalized Image Generation: Addressing Overfitting and Evaluation Bias (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
The title and abstract displayed on this webpage are from an older version of the paper.
We have updated the paper, but for some reason, it seems that the new version is not being reflected.
The updated title and abstract, as currently shown on arXiv, are as follows:
Title : Boost Your Human Image Generation Model via Direct Preference Optimization
Abstract :
Human image generation is a key focus in image synthesis due to its broad applications, but even slight inaccuracies in anatomy, pose, or details can compromise realism. To address these challenges, we explore Direct Preference Optimization (DPO), which trains models to generate preferred (winning) images while diverging from non-preferred (losing) ones. However, conventional DPO methods use generated images as winning images, limiting realism. To overcome this limitation, we propose an enhanced DPO approach that incorporates high-quality real images as winning images, encouraging outputs to resemble real images rather than generated ones. However, implementing this concept is not a trivial task. Therefore, our approach, HG-DPO (Human image Generation through DPO), employs a novel curriculum learning framework that gradually improves the output of the model toward greater realism, making training more feasible. Furthermore, HG-DPO effectively adapts to personalized text-to-image tasks, generating high-quality and identity-specific images, which highlights the practical value of our approach.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper