2605.02037v1 May 03, 2026 cs.RO

VILAS: VLA(Vision-Language-Action) 통합 저가형 로봇 조작 시스템 - 소프트 그리퍼를 이용한 접근 방식

VILAS: A VLA-Integrated Low-cost Architecture with Soft Grasping for Robotic Manipulation

Shijie Geng

Citations: 66

h-index: 3

Zijian An

Citations: 63

h-index: 3

Hadi Khezam

Citations: 0

h-index: 0

Bill Cai

Citations: 4

h-index: 1

Ran Yang

Citations: 10

h-index: 2

Yiming Feng

Citations: 10

h-index: 2

Lifeng Zhou

Citations: 52

h-index: 2

Yue Zheng

Citations: 4

h-index: 1

본 논문에서는 VILAS(VLA-Integrated Low-cost Architecture)라는 완전한 저가형, 모듈형 로봇 조작 플랫폼을 소개합니다. VILAS는 접근 가능한 하드웨어에서 엔드-투-엔드 비전-언어-액션(VLA) 정책 학습 및 배포를 지원하도록 설계되었습니다. 이 시스템은 Fairino FR5 협동 로봇 팔, Jodell RG52-50 전기 그리퍼, 그리고 이중 카메라 인지 모듈을 통합하며, ZMQ 기반 통신 아키텍처를 통해 원격 조작, 데이터 수집, 그리고 정책 배포를 단일 프레임워크 내에서 원활하게 조정합니다. 명시적인 힘 센서에 의존하지 않고도 섬세한 물체를 안전하게 조작하기 위해, 압축 하중 하에서 예측 가능한 변형을 유도하는 기리гами 기반의 소프트 컴플라이언트 그리퍼 확장부를 설계했습니다. 이를 통해 섬세한 물체와의 부드럽고 반복적인 접촉이 가능합니다. VILAS 플랫폼에서 pi_0, pi_0.5, 그리고 GR00T N1.6의 세 가지 최첨단 VLA 모델을 배포하고 평가했습니다. 모든 모델은 공개된 사전 학습된 체크포인트를 기반으로 동일한 데모 데이터셋을 사용하여 미세 조정되었습니다. 포도 잡기 작업을 통해 실험한 결과, 제안된 시스템의 효과성이 입증되었으며, 저가형 모듈형 하드웨어에서 효과적인 조작 정책을 성공적으로 학습하고 배포할 수 있음을 확인했습니다. 또한, 본 연구의 결과는 현재 VLA 모델의 실제 환경에서의 배포 특성에 대한 실질적인 통찰력을 제공합니다.

Original Abstract

We present VILAS, a fully low-cost, modular robotic manipulation platform designed to support end-to-end vision-language-action (VLA) policy learning and deployment on accessible hardware. The system integrates a Fairino FR5 collaborative arm, a Jodell RG52-50 electric gripper, and a dual-camera perception module, unified through a ZMQ-based communication architecture that seamlessly coordinates teleoperation, data collection, and policy deployment within a single framework. To enable safe manipulation of fragile objects without relying on explicit force sensing, we design a kirigami-based soft compliant gripper extension that induces predictable deformation under compressive loading, providing gentle and repeatable contact with delicate targets. We deploy and evaluate three state-of-the-art VLA models on the VILAS platform: pi_0, pi_0.5, and GR00T N1.6. All models are fine-tuned from publicly released pretrained checkpoints using an identical demonstration dataset collected via our teleoperation pipeline. Experiments on a grape grasping task validate the effectiveness of the proposed system, confirming that capable manipulation policies can be successfully trained and deployed on low-cost modular hardware. Our results further provide practical insights into the deployment characteristics of current VLA models in real-world settings.

0 Citations

0 Influential

1.5 Altmetric

7.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!