2605.07306v1 May 08, 2026 cs.RO

BioProVLA-Agent: 저렴하고 프로토콜 기반의, 시각 강화된 VLA 기반 임베디드 다중 에이전트 시스템 - 생물학 실험실 조작을 위한 폐루프 추론 기능 탑재

BioProVLA-Agent: An Affordable, Protocol-Driven, Vision-Enhanced VLA-Enabled Embodied Multi-Agent System with Closed-Loop-Capable Reasoning for Biological Laboratory Manipulation

Zhe Liu

Citations: 1,497

h-index: 15

Zhaohui Du

Citations: 17

h-index: 2

Zhe Wang

Citations: 40

h-index: 3

Hongmei Fei

Citations: 48

h-index: 4

Xiwen Cao

Citations: 1

h-index: 1

Ting Xiao

Citations: 1

h-index: 1

Qi Wang

Citations: 27

h-index: 2

Huan Jin

Citations: 49

h-index: 2

Jiaming Gu

Citations: 43

h-index: 2

Quan Lu

Citations: 30

h-index: 3

생물학 실험 자동화는 반복적인 수작업을 줄이고 재현성을 향상시킬 수 있지만, 실제 실험 환경에서의 안정적인 임베디드 실행은 여전히 어려운 과제입니다. 프로토콜은 종종 구조화되어 있지 않고, 실험 도구는 종종 투명하거나 반사성이 있으며, 다단계 절차는 단일 명령 실행을 넘어 상태 인식을 기반으로 한 실행이 필요합니다. 기존 로봇 시스템은 종종 고가의 하드웨어, 고정된 워크플로우, 전용 장비 또는 로봇 중심 인터페이스에 의존합니다. 본 연구에서는 생물학적 조작을 위한 저렴하고 프로토콜 기반의, 시각 강화된 VLA (Vision-Language-Action) 모델 기반 임베디드 다중 에이전트 시스템인 BioProVLA-Agent를 소개합니다. 이 시스템은 프로토콜을 작업 인터페이스로 사용하며, 프로토콜 파싱, 시각적 상태 검증, 폐루프 워크플로우 내 임베디드 실행을 통합합니다. 맞춤형 LLM 프로토콜 에이전트는 프로토콜을 검증 가능한 하위 작업으로 변환하며, VLM-RAG 검증 에이전트는 관찰, 로봇 상태, 검색된 지식 및 성공/실패 예제를 사용하여 준비 상태 및 완료 여부를 평가합니다. VLA 임베디드 에이전트는 경량 정책을 통해 검증된 하위 작업을 실행합니다. 실제 실험 환경에서 발생하는 시각적 문제에 대한 강건성을 향상시키기 위해, 투명한 실험 도구, 반사, 조명 변화 및 과다 노출 문제를 해결하는 온라인 증강 전략인 AugSmolVLA를 개발했습니다. 이 시스템은 튜브 로딩, 정렬, 폐기물 처리, 캡 조임, 액체 주입 등 15개의 기본 작업, 6개의 복합 워크플로우 및 3개의 양손 작업으로 구성된 계층적 벤치마크를 통해 평가되었습니다. 정상 및 고노출 환경 모두에서 AugSmolVLA는 ACT, X-VLA 및 원래의 SmolVLA보다 실행 안정성이 뛰어나며, 특히 정밀한 위치 결정, 투명한 물체 조작, 복합 워크플로우 및 시각적으로 왜곡된 장면에서 더욱 그렇습니다. 이러한 결과는 생물학적 조작을 위한 접근 가능하고, 프로토콜 중심이며, 검증 기능을 갖춘 임베디드 AI의 실용적인 경로를 제시합니다.

Original Abstract

Biological laboratory automation can reduce repetitive manual work and improve reproducibility, but reliable embodied execution in wet-lab environments remains challenging. Protocols are often unstructured, labware is frequently transparent or reflective, and multi-step procedures require state-aware execution beyond one-shot instruction following. Existing robotic systems often rely on costly hardware, fixed workflows, dedicated instruments, or robotics-oriented interfaces. Here, we introduce BioProVLA-Agent, an affordable, protocol-driven, vision-enhanced embodied multi-agent system enabled by Vision-Language-Action (VLA) models for biological manipulation. The system uses protocols as the task interface and integrates protocol parsing, visual state verification, and embodied execution in a closed-loop workflow. A Tailored LLM Protocol Agent converts protocols into verifiable subtasks; a VLM-RAG Verification Agent assesses readiness and completion using observations, robot states, retrieved knowledge, and success/failure examples; and a VLA Embodied Agent executes verified subtasks through a lightweight policy. To improve robustness under wet-lab visual perturbations, we develop AugSmolVLA, an online augmentation strategy targeting transparent labware, reflections, illumination shifts, and overexposure. We evaluate the system on a hierarchical benchmark covering 15 atomic tasks, 6 composite workflows, and 3 bimanual tasks, including tube loading, sorting, waste disposal, cap twisting, and liquid pouring. Across normal and high-exposure settings, AugSmolVLA improves execution stability over ACT, X-VLA, and the original SmolVLA, especially for precise placement, transparent-object manipulation, composite workflows, and visually degraded scenes. These results suggest a practical route toward accessible, protocol-centered, and verification-capable embodied AI for biological manipulation.

1 Citations

0 Influential

7.5 Altmetric

38.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!