2603.09121v1 Mar 10, 2026 cs.RO

DexHiL: 정교한 조작을 위한 비전-언어-행동 모델의 사후 훈련을 위한 인간-중심 프레임워크

DexHiL: A Human-in-the-Loop Framework for Vision-Language-Action Model Post-Training in Dexterous Manipulation

Yifan Han

Citations: 31

h-index: 3

Zhongxia Chen

Citations: 138

h-index: 3

Yuxuan Zhao

Citations: 5

h-index: 1

Congsheng Xu

Citations: 152

h-index: 4

Yanming Shao

Citations: 25

h-index: 3

Yichuan Peng

Citations: 22

h-index: 2

Yao Mu

Citations: 36

h-index: 4

Wenzhao Lian

Citations: 35

h-index: 3

비전-언어-행동(VLA) 모델은 로봇 조작 분야에서 유망한 일반화 능력을 보여주었지만, 특정하고 복잡한 하위 작업에 적용하기 위해서는 여전히 효과적인 사후 훈련이 필요합니다. 동시에, 인간-중심 학습(HiL)은 로봇 정책을 개선하는 강력한 메커니즘으로 입증되었습니다. 그러나 이러한 패러다임을 정교한 조작에 적용하는 것은 여전히 어려운 문제입니다. 다중 손가락 제어는 고차원적이고, 접촉이 빈번하며, 표준적인 팔 동작과 다른 실행 분포를 나타내어, 기존의 정교한 VLA 시스템의 신뢰성과 적응성을 제한합니다. 본 연구에서는 정교한 VLA 모델을 위한 최초의 통합된 팔-손 인간-중심 프레임워크인 DexHiL을 제시합니다. DexHiL은 단일 시스템 내에서 팔과 정교한 손에 대한 조화로운 개입을 가능하게 합니다. DexHiL은 사후 훈련을 위한 교정 세그먼트를 우선적으로 샘플링하는 개입 인식 데이터 샘플링 전략과, 실행 중 즉각적인 인간 수정 기능을 지원하는 경량 텔레운전 인터페이스를 도입합니다. 실제 로봇 실험 결과, DexHiL은 효과적인 사후 훈련 프레임워크로서, 다양한 작업에서 성공률을 평균 25% 향상시켜, 기존의 오프라인 전용 미세 조정 방법보다 훨씬 뛰어난 성능을 보여줍니다. 프로젝트 페이지: https://chenzhongxi-sjtu.github.io/dexhil/

Original Abstract

While Vision-Language-Action (VLA) models have demonstrated promising generalization capabilities in robotic manipulation, deploying them on specific and complex downstream tasks still demands effective post-training. In parallel, Human-in-the-Loop (HiL) learning has proven to be a powerful mechanism for refining robot policies. However, extending this paradigm to dexterous manipulation remains challenging: multi-finger control is high-dimensional, contact-intensive, and exhibits execution distributions that differ markedly from standard arm motions, leaving existing dexterous VLA systems limited in reliability and adaptability. We present DexHiL, the first integrated arm-hand human-in-the-loop framework for dexterous VLA models, enabling coordinated interventions over the arm and the dexterous hand within a single system. DexHiL introduces an intervention-aware data sampling strategy that prioritizes corrective segments for post-training, alongside a lightweight teleoperation interface that supports instantaneous human corrections during execution. Real-robot experiments demonstrate that DexHiL serves as an effective post-training framework, yielding a substantial performance leap, outperforming standard offline-only fine-tuning baselines by an average of 25% in success rates across distinct tasks. Project page: https://chenzhongxi-sjtu.github.io/dexhil/

6 Citations

0 Influential

2 Altmetric

16.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!