2605.25829v1 May 25, 2026 cs.RO

OASIS: Observation-Action Space Alignment via SE(3) Trajectory Prediction for Robotic Manipulation

Mingyang Li
Mingyang Li
Citations: 2
h-index: 1
Xinzhe Chen
Xinzhe Chen
Citations: 2
h-index: 1
Liqiu Huang
Liqiu Huang
Citations: 195
h-index: 7
Xuguang Lan
Xuguang Lan
Citations: 90
h-index: 6
Sihua Ren
Sihua Ren
Citations: 30
h-index: 2
Zeyang Liu
Zeyang Liu
Citations: 44
h-index: 4
Xingyu Chen
Xingyu Chen
Citations: 217
h-index: 4
Haowen Sun
Haowen Sun
Citations: 13
h-index: 2

Recent vision-language-action (VLA) models and world action models (WAMs) advance robotic manipulation by enriching intermediate representations with auxiliary spatial features or future visual-state prediction. However, these representations largely remain within the observation space and do not share the rigid-body geometry of the action space, forcing the action decoder to implicitly recover this geometry. We propose OASIS, a visuomotor policy that aligns the intermediate representation with the action space via $SE(3)$ end-effector trajectory prediction. OASIS couples a 3D-aware feature encoder that fuses vision-language and metric-depth features with an $SE(3)$ trajectory predictor that produces a camera-frame end-effector trajectory. Conditioned on the predictor's pose-supervised hidden states, the action decoder generates action chunks consistent with rigid-body motion. Across simulation and real-world experiments, OASIS outperforms VLA and WAM baselines in success rate and out-of-distribution generalization. Our project page is available at https://npuhandsome.github.io/OASIS_web.

0 Citations
0 Influential
3.5 Altmetric
17.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!