2602.07322v1 Feb 07, 2026 cs.RO

행위-행위 흐름 매칭

Action-to-Action Flow Matching

Jindou Jia

Citations: 22

h-index: 2

Gen Li

Citations: 11

h-index: 1

Xiangyu Chen

Citations: 9

h-index: 1

Tuo An

Citations: 67

h-index: 4

Yuxuan Hu

Citations: 11

h-index: 2

Jingliang Li

Citations: 9

h-index: 1

Xinying Guo

Citations: 1,583

h-index: 7

Jianfei Yang

Citations: 27

h-index: 2

최근, 확산 모델 기반의 정책들이 로봇 공학 분야에서 행동 예측을 조건부 디노이징 과정으로 정의하여 놀라운 성공을 거두었습니다. 그러나, 일반적인 방법으로 임의의 가우시안 노이즈로부터 샘플링하는 방식은 깨끗한 행동을 생성하기 위해 여러 번의 반복적인 단계를 필요로 하며, 이는 실시간 제어에 큰 병목 현상을 초래하는 높은 추론 지연 시간을 발생시킵니다. 본 논문에서는 정보가 없는 노이즈 샘플링의 필요성을 검토하고, 이전 행동에 의해 초기화되는 방식으로 전환하는 새로운 정책 패러다임인 Action-to-Action 흐름 매칭 (A2A)을 제안합니다. 기존 방법과는 달리, A2A는 고유한 감각 피드백을 정적인 조건으로 취급하는 것이 아니라, 과거의 고유한 감각 시퀀스를 활용하여 이를 고차원 잠재 공간에 임베딩하고, 행동 생성의 시작점으로 사용합니다. 이러한 설계는 비용이 많이 드는 반복적인 디노이징을 우회하면서 로봇의 물리적 역학 및 시간적 연속성을 효과적으로 포착합니다. 광범위한 실험 결과, A2A는 높은 학습 효율성, 빠른 추론 속도 및 향상된 일반화 성능을 보여줍니다. 특히, A2A는 단 한 번의 추론 단계 (0.56ms 지연 시간)으로 고품질의 행동을 생성할 수 있으며, 시각적 왜곡에 대한 우수한 강건성 및 새로운 환경에 대한 향상된 일반화 능력을 보여줍니다. 마지막으로, A2A를 비디오 생성에 적용하여, 시간 모델링에서의 더 넓은 활용 가능성을 보여줍니다. 프로젝트 웹사이트: https://lorenzo-0-0.github.io/A2A_Flow_Matching.

Original Abstract

Diffusion-based policies have recently achieved remarkable success in robotics by formulating action prediction as a conditional denoising process. However, the standard practice of sampling from random Gaussian noise often requires multiple iterative steps to produce clean actions, leading to high inference latency that incurs a major bottleneck for real-time control. In this paper, we challenge the necessity of uninformed noise sampling and propose Action-to-Action flow matching (A2A), a novel policy paradigm that shifts from random sampling to initialization informed by the previous action. Unlike existing methods that treat proprioceptive action feedback as static conditions, A2A leverages historical proprioceptive sequences, embedding them into a high-dimensional latent space as the starting point for action generation. This design bypasses costly iterative denoising while effectively capturing the robot's physical dynamics and temporal continuity. Extensive experiments demonstrate that A2A exhibits high training efficiency, fast inference speed, and improved generalization. Notably, A2A enables high-quality action generation in as few as a single inference step (0.56 ms latency), and exhibits superior robustness to visual perturbations and enhanced generalization to unseen configurations. Lastly, we also extend A2A to video generation, demonstrating its broader versatility in temporal modeling. Project site: https://lorenzo-0-0.github.io/A2A_Flow_Matching.

8 Citations

1 Influential

3.5 Altmetric

27.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!