2602.13810v1 Feb 14, 2026 cs.LG

원스텝 액션 생성을 위한 즉각 속도 제약 기반 평균 흐름 정책

Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation

Guojian Zhan

Citations: 174

h-index: 6

Letian Tao

Citations: 36

h-index: 4

S. Li

Citations: 15

h-index: 2

Pengcheng Wang

Citations: 25

h-index: 2

Yixiao Wang

Citations: 246

h-index: 7

Yiheng Li

Citations: 67

h-index: 5

Yuxin Chen

Citations: 54

h-index: 4

Masayoshi Tomizuka

Citations: 520

h-index: 8

표현력이 뛰어나고 효율적인 정책 함수 학습은 강화 학습(RL) 분야에서 유망한 연구 방향입니다. 최근 흐름 기반 정책은 빠르고 결정적인 샘플링 과정을 통해 복잡한 행동 분포를 모델링하는 데 효과적인 것으로 입증되었지만, 여전히 표현력과 계산 부담 사이의 균형이라는 어려움에 직면하고 있으며, 이는 일반적으로 흐름 단계의 수에 의해 조절됩니다. 본 연구에서는 평균 속도 정책(MVP)이라는 새로운 생성 정책 함수를 제안합니다. MVP는 평균 속도장을 모델링하여 가장 빠른 원스텝 행동 생성을 달성합니다. 높은 표현력을 보장하기 위해, 평균 속도장에 즉각 속도 제약(IVC)을 도입하여 학습 과정 중에 적용합니다. 우리는 이 설계가 중요한 경계 조건 역할을 하며, 학습 정확도를 향상시키고 정책의 표현력을 향상시킨다는 것을 이론적으로 증명합니다. 실험적으로, 우리의 MVP는 Robomimic 및 OGBench에서 제공하는 여러 어려운 로봇 조작 작업에서 최고 수준의 성공률을 달성했습니다. 또한, 기존의 흐름 기반 정책과 비교하여 학습 및 추론 속도 측면에서 상당한 개선을 보입니다.

Original Abstract

Learning expressive and efficient policy functions is a promising direction in reinforcement learning (RL). While flow-based policies have recently proven effective in modeling complex action distributions with a fast deterministic sampling process, they still face a trade-off between expressiveness and computational burden, which is typically controlled by the number of flow steps. In this work, we propose mean velocity policy (MVP), a new generative policy function that models the mean velocity field to achieve the fastest one-step action generation. To ensure its high expressiveness, an instantaneous velocity constraint (IVC) is introduced on the mean velocity field during training. We theoretically prove that this design explicitly serves as a crucial boundary condition, thereby improving learning accuracy and enhancing policy expressiveness. Empirically, our MVP achieves state-of-the-art success rates across several challenging robotic manipulation tasks from Robomimic and OGBench. It also delivers substantial improvements in training and inference speed over existing flow-based policy baselines.

6 Citations

2 Influential

4 Altmetric

30.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!