2605.26013v1 May 25, 2026 cs.LG

AdvantageFlow: Advantage-Weighted Least Squares for RL in Flow Models

B. Kveton

Citations: 4,851

h-index: 38

Subhojyoti Mukherjee

Adobe Research

Citations: 325

h-index: 11

V. Lai

Citations: 161

h-index: 5

K. Singh

Citations: 22

h-index: 2

Anup Rao

Citations: 14

h-index: 2

We introduce AdvantageFlow, a forward-process reinforcement learning algorithm for rectified flow models. Unlike Flow-GRPO, which optimizes the reverse process, we optimize an advantage-weighted forward-process prediction loss. This optimization problem is unstable when advantages are negative and the loss becomes non-convex. We stabilize it by rollout policy regularization, which reduces variance and arises from fitting a local reward-improving target distribution. We evaluate AdvantageFlow on image generation tasks with Stable Diffusion 3.5 Medium. It outperforms both Flow-GRPO and a state-of-the-art forward-process RL baseline based on negative-aware fine-tuning.

0 Citations

0 Influential

19 Altmetric

95.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!