2605.25829v1 May 25, 2026 cs.RO

OASIS: Observation-Action Space Alignment via SE(3) Trajectory Prediction for Robotic Manipulation

Mingyang Li

Citations: 2

h-index: 1

Xinzhe Chen

Citations: 2

h-index: 1

Liqiu Huang

Citations: 195

h-index: 7

Xuguang Lan

Citations: 90

h-index: 6

Sihua Ren

Citations: 30

h-index: 2

Zeyang Liu

Citations: 44

h-index: 4

Xingyu Chen

Citations: 217

h-index: 4

Haowen Sun

Citations: 13

h-index: 2

Recent vision-language-action (VLA) models and world action models (WAMs) advance robotic manipulation by enriching intermediate representations with auxiliary spatial features or future visual-state prediction. However, these representations largely remain within the observation space and do not share the rigid-body geometry of the action space, forcing the action decoder to implicitly recover this geometry. We propose OASIS, a visuomotor policy that aligns the intermediate representation with the action space via $SE(3)$ end-effector trajectory prediction. OASIS couples a 3D-aware feature encoder that fuses vision-language and metric-depth features with an $SE(3)$ trajectory predictor that produces a camera-frame end-effector trajectory. Conditioned on the predictor's pose-supervised hidden states, the action decoder generates action chunks consistent with rigid-body motion. Across simulation and real-world experiments, OASIS outperforms VLA and WAM baselines in success rate and out-of-distribution generalization. Our project page is available at https://npuhandsome.github.io/OASIS_web.

0 Citations

0 Influential

3.5 Altmetric

17.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!