2606.13578v1 Jun 11, 2026 cs.CL

LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories

Yujia Liu
Yujia Liu
Citations: 0
h-index: 0
Lei Bai
Lei Bai
Citations: 75
h-index: 4
Dongzhan Zhou
Dongzhan Zhou
Citations: 151
h-index: 6
Shuofei Qiao
Shuofei Qiao
Citations: 1,542
h-index: 15
Baochang Ren
Baochang Ren
Citations: 25
h-index: 2
Rui Li
Rui Li
Citations: 7
h-index: 1
Daqiang Gao
Daqiang Gao
Citations: 2
h-index: 1
Huajun Chen
Huajun Chen
Citations: 208
h-index: 5
Wangmeng Zuo
Wangmeng Zuo
Citations: 109
h-index: 4
Xinjie Liu
Xinjie Liu
Citations: 7
h-index: 2
Xi Chen
Xi Chen
Citations: 21
h-index: 2
Zeqin Su
Zeqin Su
Citations: 0
h-index: 0
Jintao Xing
Jintao Xing
Citations: 0
h-index: 0
Minting Pan
Minting Pan
Citations: 103
h-index: 5
Ningyu Zhang
Ningyu Zhang
Citations: 25
h-index: 2
Xiangyu Zhao
Xiangyu Zhao
Citations: 0
h-index: 0
Yanshu Liu
Yanshu Liu
Citations: 43
h-index: 1
Z. Xue
Z. Xue
Citations: 17
h-index: 2

Scientific laboratories increasingly rely on AI systems to reason about experiments, but the physical act of doing science remains largely outside their reach. AI can help read literature, generate hypotheses, and plan protocols, yet the execution of those protocols at the bench still requires a human operator. Vision-Language-Action (VLA) models provide one possible interface between written protocols and robot execution, but existing policies are trained mostly on household and tabletop demonstrations and rarely encounter the instruments, transparent liquids, or fixed protocol workflows found in scientific laboratories. Closing this gap requires both laboratory-specific supervision and a unified learning framework that can accommodate the diverse robot embodiments used to execute experimental protocols. We therefore identify data and embodiment as central bottlenecks alongside model design. To address the data side, we build RoboGenesis, a simulation-based workflow and data engine that composes configured laboratory workflows from atomic skills, validates and filters rollouts, and exports structured demonstrations across supported robot profiles. On the policy side, we present LabVLA, trained with a two-stage recipe: FAST action token pretraining first makes the Qwen3-VL-4B-Instruct backbone action aware before any continuous control is learned, and flow matching posttraining then attaches a DiT action expert under knowledge insulation. On the LabUtopia benchmark, LabVLA achieves the highest average success rate among all evaluated baselines under both in-distribution and out-of-distribution settings.

0 Citations
0 Influential
7.5 Altmetric
37.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!