2601.07304v1 Jan 12, 2026 cs.RO

자율 지게차를 위한 장기 다중 목표 작업에서의 이종 멀티 전문가 강화 학습

Heterogeneous Multi-Expert Reinforcement Learning for Long-Horizon Multi-Goal Tasks in Autonomous Forklifts

Fan Guo

Citations: 23

h-index: 3

Kang Song

Citations: 0

h-index: 0

Yun Chen

Citations: 5

h-index: 2

Bowei Huang

Citations: 10

h-index: 1

구조화되지 않은 창고 환경에서의 자율 모바일 조작은 효율적인 대규모 탐색과 정밀한 물체 상호 작용 간의 균형을 요구합니다. 기존의 엔드투엔드 학습 방식은 종종 이러한 상이한 단계의 상충되는 요구 사항을 처리하는 데 어려움을 겪습니다. 탐색은 넓은 공간에서의 견고한 의사 결정에 의존하는 반면, 조작은 미세한 지역적 세부 사항에 대한 높은 민감도가 필요합니다. 단일 네트워크가 이러한 다양한 목표를 동시에 학습하도록 강제하면 최적화 간섭이 발생하여, 한 작업의 성능 향상이 다른 작업의 성능을 저하시키는 현상이 나타납니다. 이러한 제한 사항을 해결하기 위해, 자율 지게차에 특화된 이종 멀티 전문가 강화 학습(HMER) 프레임워크를 제안합니다. HMER은 장기적인 작업을 전문화된 하위 정책으로 분해하고, 이러한 정책은 의미 기반 작업 계획기에 의해 제어됩니다. 이러한 구조는 거시적인 탐색과 미시적인 조작을 분리하여, 각 전문가가 특정 행동 공간에 집중하면서 상호 간섭 없이 작동할 수 있도록 합니다. 또한, 작업 계획과 연속적인 제어 간의 격차를 해소하기 위해, 계획기는 이러한 전문가들의 순차적 실행을 조정합니다. 더욱이, 희소한 탐색 문제를 해결하기 위해, 하이브리드 모방-강화 학습 훈련 전략을 도입했습니다. 이 방법은 전문가의 시연 데이터를 사용하여 정책을 초기화하고, 강화 학습을 통해 미세 조정을 수행합니다. Gazebo 시뮬레이션에서의 실험 결과, HMER은 기존의 순차적 및 엔드투엔드 방식에 비해 현저히 우수한 성능을 보였습니다. 제안하는 방법은 94.2%의 작업 성공률을 달성했습니다 (기존 방식은 62.5%), 작업 시간을 21.4% 단축했으며, 배치 오차를 1.5cm 이내로 유지하여, 정밀한 자재 취급에 대한 효능을 입증했습니다.

Original Abstract

Autonomous mobile manipulation in unstructured warehouses requires a balance between efficient large-scale navigation and high-precision object interaction. Traditional end-to-end learning approaches often struggle to handle the conflicting demands of these distinct phases. Navigation relies on robust decision-making over large spaces, while manipulation needs high sensitivity to fine local details. Forcing a single network to learn these different objectives simultaneously often causes optimization interference, where improving one task degrades the other. To address these limitations, we propose a Heterogeneous Multi-Expert Reinforcement Learning (HMER) framework tailored for autonomous forklifts. HMER decomposes long-horizon tasks into specialized sub-policies controlled by a Semantic Task Planner. This structure separates macro-level navigation from micro-level manipulation, allowing each expert to focus on its specific action space without interference. The planner coordinates the sequential execution of these experts, bridging the gap between task planning and continuous control. Furthermore, to solve the problem of sparse exploration, we introduce a Hybrid Imitation-Reinforcement Training Strategy. This method uses expert demonstrations to initialize the policy and Reinforcement Learning for fine-tuning. Experiments in Gazebo simulations show that HMER significantly outperforms sequential and end-to-end baselines. Our method achieves a task success rate of 94.2\% (compared to 62.5\% for baselines), reduces operation time by 21.4\%, and maintains placement error within 1.5 cm, validating its efficacy for precise material handling.

0 Citations

0 Influential

1.5 Altmetric

7.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!