2601.00969v1 Jan 02, 2026 cs.RO

가치 기반 시각-언어-행동 계획 및 탐색

Value Vision-Language-Action Planning & Search

Ali Salamatian

Citations: 3

h-index: 1

Kejia Ren

Citations: 194

h-index: 5

Kieran Pattison

Citations: 0

h-index: 0

Cyrus Neary

Citations: 45

h-index: 2

시각-언어-행동(VLA) 모델은 로봇 조작을 위한 강력한 범용 정책으로 등장했지만, 여전히 행동 복제를 기반으로 하기 때문에 데이터 분포 변화에 취약한 단점이 있습니다. 사전 훈련된 모델에 몬테카를로 트리 탐색(MCTS)과 같은 테스트 시간 탐색 알고리즘을 추가하면 이러한 문제를 완화할 수 있지만, 기존 방식은 VLA 사전 정보에만 의존하며, 예상되는 미래 보상에 대한 근거 있는 추정값을 제공하지 못합니다. 따라서 사전 정보가 부정확한 경우, 탐색 알고리즘은 탐색 과정을 통해 행동 선택을 수정할 수 있을 뿐이지만, 이는 효과를 발휘하려면 광범위한 시뮬레이션이 필요합니다. 이러한 제한 사항을 해결하기 위해, 저희는 가치 기반 시각-언어-행동 계획 및 탐색(V-VLAPS) 프레임워크를 제안합니다. 이 프레임워크는 MCTS에 가볍고 학습 가능한 가치 함수를 추가합니다. 고정된 VLA 기반 모델(Octo)의 잠재 표현에 대해 간단한 다층 퍼셉트론(MLP)을 학습시켜, 탐색 과정에 명시적인 성공 신호를 제공하고, 이를 통해 행동 선택을 높은 가치 영역으로 편향시킵니다. 저희는 V-VLAPS를 LIBERO 로봇 조작 스위트에서 평가했으며, 그 결과 가치 기반 탐색이 VLA 사전 정보에만 의존하는 기준 모델에 비해 성공률을 5% 이상 향상시키고, 평균적으로 MCTS 시뮬레이션 횟수를 5~15% 줄이는 것을 확인했습니다.

Original Abstract

Vision-Language-Action (VLA) models have emerged as powerful generalist policies for robotic manipulation, yet they remain fundamentally limited by their reliance on behavior cloning, leading to brittleness under distribution shift. While augmenting pretrained models with test-time search algorithms like Monte Carlo Tree Search (MCTS) can mitigate these failures, existing formulations rely solely on the VLA prior for guidance, lacking a grounded estimate of expected future return. Consequently, when the prior is inaccurate, the planner can only correct action selection via the exploration term, which requires extensive simulation to become effective. To address this limitation, we introduce Value Vision-Language-Action Planning and Search (V-VLAPS), a framework that augments MCTS with a lightweight, learnable value function. By training a simple multilayer perceptron (MLP) on the latent representations of a fixed VLA backbone (Octo), we provide the search with an explicit success signal that biases action selection toward high-value regions. We evaluate V-VLAPS on the LIBERO robotic manipulation suite, demonstrating that our value-guided search improves success rates by over 5 percentage points while reducing the average number of MCTS simulations by 5-15 percent compared to baselines that rely only on the VLA prior.

0 Citations

0 Influential

2.5 Altmetric

12.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!