2606.05950v1 Jun 04, 2026 cs.AI

Edit-R2: Context-Aware Reinforcement Learning for Multi-Turn Image Editing

Pengfei Wan
Pengfei Wan
Citations: 306
h-index: 10
Kun Gai
Kun Gai
Citations: 160
h-index: 7
Fangyuan Kong
Fangyuan Kong
Citations: 657
h-index: 6
Yuxiao Ye
Yuxiao Ye
Citations: 20
h-index: 2
Haoran He
Haoran He
Citations: 105
h-index: 6
Xintao Wang
Xintao Wang
Citations: 2,145
h-index: 22
Ling Pan
Ling Pan
Citations: 25
h-index: 3

Text-guided image editing has advanced rapidly with diffusion models and unified multimodal foundation models. However, most existing methods remain confined to single-turn settings, overlooking the more realistic scenario of multi-turn in-context editing, where users iteratively refine an image through a sequence of instructions. In this setting, a model must follow each new instruction while preserving accumulated session-level constraints, challenged by two coupled failure modes: long-context dilution, where sparse textual constraints become difficult to recover from growing interleaved image-text histories, and state contamination, where earlier editing mistakes degrade subsequent generations. We introduce Edit-R2, a novel reinforcement learning post-training framework for unified multimodal models. Edit-R2 reconstructs the operative session intent, which effectively consolidates scattered historical constraints into an explicit reasoning trace before each editing turn. It further enables multi-turn RL over both reasoning and generation through a unified objective that jointly optimizes intent reconstruction generation in discrete text space and flow-matching image generation in continuous latent space, while a trajectory filtering mechanism suppresses corrupted rollouts to stabilize training under state contamination. To support systematic evaluation, we introduce MICE-Bench, a large-scale benchmark for multi-turn in-context editing with automated metrics for instruction following (IF), content consistency (CC), and global awareness (GA) over accumulated session constraints. Experiments show that Edit-R2 substantially improves multi-turn in-context editing and achieves competitive performance compared against strong baselines.

0 Citations
0 Influential
11 Altmetric
55.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!