2605.26013v1 May 25, 2026 cs.LG

AdvantageFlow: Advantage-Weighted Least Squares for RL in Flow Models

B. Kveton
B. Kveton
Citations: 4,851
h-index: 38
Subhojyoti Mukherjee
Subhojyoti Mukherjee
Adobe Research
Citations: 325
h-index: 11
V. Lai
V. Lai
Citations: 161
h-index: 5
K. Singh
K. Singh
Citations: 22
h-index: 2
Anup Rao
Anup Rao
Citations: 14
h-index: 2

We introduce AdvantageFlow, a forward-process reinforcement learning algorithm for rectified flow models. Unlike Flow-GRPO, which optimizes the reverse process, we optimize an advantage-weighted forward-process prediction loss. This optimization problem is unstable when advantages are negative and the loss becomes non-convex. We stabilize it by rollout policy regularization, which reduces variance and arises from fitting a local reward-improving target distribution. We evaluate AdvantageFlow on image generation tasks with Stable Diffusion 3.5 Medium. It outperforms both Flow-GRPO and a state-of-the-art forward-process RL baseline based on negative-aware fine-tuning.

0 Citations
0 Influential
19 Altmetric
95.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!