2604.19406v1 Apr 21, 2026 cs.CV

HP-Edit: 이미지 편집을 위한 인간 선호도 기반 후속 학습 프레임워크

HP-Edit: A Human-Preference Post-Training Framework for Image Editing

Chong Wang

Citations: 65

h-index: 3

Lina Lei

Citations: 47

h-index: 3

Yuping Qiu

Citations: 31

h-index: 2

Jiaxiu Jiang

Citations: 94

h-index: 3

Xinran Qin

Citations: 10

h-index: 2

Zhikai Chen

Citations: 74

h-index: 3

Fenglong Song

Citations: 158

h-index: 6

Renjing Pei

Citations: 975

h-index: 18

Wangmeng Zuo

Citations: 109

h-index: 4

Fan Li

Citations: 46

h-index: 4

Jiaqi Xu

Citations: 66

h-index: 4

Zhixin Wang

Citations: 70

h-index: 5

일반적인 이미지 편집 작업은 실세계 콘텐츠 편집을 위한 주요 패러다임으로 강력한 생성 확산 모델을 사용하는 경우가 많습니다. 반면, Diffusion-DPO 및 Flow-GRPO와 같은 강화 학습(RL) 방법은 생성 품질을 더욱 향상시켰지만, 확장 가능한 인간 선호도 데이터셋과 다양한 편집 요구사항에 맞춰 설계된 프레임워크 부족으로 인해 확산 모델 기반 편집에 강화 학습 인간 피드백(RLHF)을 효율적으로 적용하는 것은 아직 많이 연구되지 않았습니다. 이러한 격차를 해소하기 위해, 우리는 인간 선호도에 맞춘 편집을 위한 후속 학습 프레임워크인 HP-Edit을 제안하고, 8가지 일반적인 작업과 일반적인 객체 편집을 균형 있게 포함하는 실제 데이터셋인 RealPref-50K을 소개합니다. 특히, HP-Edit은 소량의 인간 선호도 평가 데이터와 사전 학습된 시각적 거대 언어 모델(VLM)을 활용하여 자동화되고 인간 선호도에 부합하는 평가 도구인 HP-Scorer를 개발합니다. 우리는 HP-Scorer를 사용하여 확장 가능한 선호도 데이터셋을 효율적으로 구축하고, 편집 모델의 후속 학습을 위한 보상 함수로 활용합니다. 또한, 실제 편집 성능을 평가하기 위한 벤치마크인 RealPref-Bench를 소개합니다. 광범위한 실험 결과는 우리의 접근 방식이 Qwen-Image-Edit-2509과 같은 모델의 성능을 크게 향상시켜, 모델의 출력이 인간의 선호도와 더욱 일치하도록 만든다는 것을 보여줍니다.

Original Abstract

Common image editing tasks typically adopt powerful generative diffusion models as the leading paradigm for real-world content editing. Meanwhile, although reinforcement learning (RL) methods such as Diffusion-DPO and Flow-GRPO have further improved generation quality, efficiently applying Reinforcement Learning from Human Feedback (RLHF) to diffusion-based editing remains largely unexplored, due to a lack of scalable human-preference datasets and frameworks tailored to diverse editing needs. To fill this gap, we propose HP-Edit, a post-training framework for Human Preference-aligned Editing, and introduce RealPref-50K, a real-world dataset across eight common tasks and balancing common object editing. Specifically, HP-Edit leverages a small amount of human-preference scoring data and a pretrained visual large language model (VLM) to develop HP-Scorer--an automatic, human preference-aligned evaluator. We then use HP-Scorer both to efficiently build a scalable preference dataset and to serve as the reward function for post-training the editing model. We also introduce RealPref-Bench, a benchmark for evaluating real-world editing performance. Extensive experiments demonstrate that our approach significantly enhances models such as Qwen-Image-Edit-2509, aligning their outputs more closely with human preference.

1 Citations

0 Influential

9 Altmetric

46.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!