2602.11978v1 Feb 12, 2026 cs.RO

에이전트 안내를 활용한 로봇 강화학습 가속화

Accelerating Robotic Reinforcement Learning with Agent Guidance

Haojun Chen

Citations: 82

h-index: 2

Z. Zou

Citations: 76

h-index: 4

Chengdong Ma

Citations: 535

h-index: 8

Yaoxiang Pu

Citations: 0

h-index: 0

Haotong Zhang

Citations: 10

h-index: 2

Yuanpei Chen

Citations: 681

h-index: 13

Yaodong Yang

Citations: 24

h-index: 1

강화학습(RL)은 자율 로봇이 시행착오를 통해 범용 조작 기술을 마스터할 수 있는 강력한 패러다임을 제공한다. 그러나 실제 환경에서의 적용은 심각한 샘플 비효율성으로 인해 제약을 받는다. 최근 인간 참여형(Human-in-the-Loop, HIL) 방법은 인간의 교정을 사용하여 훈련을 가속화하지만, 이 접근법은 확장성의 한계에 부딪힌다. 인간 감독자에 의존하는 것은 로봇 확장을 제한하는 1:1 감독 비율을 강제하고, 장시간 작업에 따른 작업자의 피로를 유발하며, 일관되지 않은 인간의 숙련도로 인해 높은 변동성을 초래한다. 우리는 인간 감독자를 멀티모달 에이전트로 대체하여 훈련 파이프라인을 자동화하는 프레임워크인 에이전트 안내 기반 정책 탐색(Agent-guided Policy Search, AGPS)을 제안한다. 우리의 핵심 인사이트는 에이전트를 시맨틱 세계 모델로 간주하여, 물리적 탐색을 구조화하기 위한 내재적 가치 사전 지식(intrinsic value priors)을 주입할 수 있다는 것이다. 실행 가능한 도구를 활용하여, 에이전트는 교정 경유지(corrective waypoints)와 탐색 가지치기(exploration pruning)를 위한 공간적 제약을 통해 정밀한 안내를 제공한다. 우리는 정밀 삽입부터 변형 가능한 물체 조작에 이르는 두 가지 작업에서 제안하는 접근법을 검증하였다. 결과는 AGPS가 샘플 효율성 측면에서 HIL 방법을 능가함을 입증한다. 이는 감독 파이프라인을 자동화함으로써 노동력이 필요 없고 확장 가능한 로봇 학습으로 나아가는 길을 열어준다. 프로젝트 웹사이트: https://agps-rl.github.io/agps.

Original Abstract

Reinforcement Learning (RL) offers a powerful paradigm for autonomous robots to master generalist manipulation skills through trial-and-error. However, its real-world application is stifled by severe sample inefficiency. Recent Human-in-the-Loop (HIL) methods accelerate training by using human corrections, yet this approach faces a scalability barrier. Reliance on human supervisors imposes a 1:1 supervision ratio that limits fleet expansion, suffers from operator fatigue over extended sessions, and introduces high variance due to inconsistent human proficiency. We present Agent-guided Policy Search (AGPS), a framework that automates the training pipeline by replacing human supervisors with a multimodal agent. Our key insight is that the agent can be viewed as a semantic world model, injecting intrinsic value priors to structure physical exploration. By using executable tools, the agent provides precise guidance via corrective waypoints and spatial constraints for exploration pruning. We validate our approach on two tasks, ranging from precision insertion to deformable object manipulation. Results demonstrate that AGPS outperforms HIL methods in sample efficiency. This automates the supervision pipeline, unlocking the path to labor-free and scalable robot learning. Project website: https://agps-rl.github.io/agps.

0 Citations

0 Influential

6.5 Altmetric

32.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!