2603.23838v1 Mar 25, 2026 cs.AI

학습 기반 우선순위 기반 계획: 창고 자동화를 위한 수명 주기 다중 에이전트 경로 탐색

Learning-guided Prioritized Planning for Lifelong Multi-Agent Path Finding in Warehouse Automation

Yining Ma

Citations: 16

h-index: 3

Cathy Wu

Citations: 1

h-index: 1

Han Zheng

Citations: 2

h-index: 1

Brandon Araki

Citations: 902

h-index: 13

Jingkai Chen

Citations: 430

h-index: 7

수명 주기 다중 에이전트 경로 탐색(MAPF)은 현대 창고 자동화에 필수적이며, 여러 로봇이 충돌 없이 최적의 시스템 처리량을 달성할 수 있도록 지속적으로 경로를 탐색해야 합니다. 그러나 창고 환경의 복잡성과 수명 주기 MAPF의 장기적인 특성은 종종 기존 탐색 기반 솔루션에 비용이 많이 드는 수정 사항을 요구합니다. 머신러닝 방법이 연구되었지만, 탐색 기반 방법보다 우수하다는 결론은 아직 명확하지 않습니다. 본 논문에서는 머신러닝(RL) 기반의 순차적 수평선 우선순위 계획(RL-RH-PP) 프레임워크를 소개합니다. 이는 수명 주기 MAPF를 위한 첫 번째 통합 프레임워크로, RL을 탐색 기반 계획과 결합합니다. 특히, 우리는 단순성과 학습 기반 우선순위 할당 정책과의 통합 유연성으로 인해 기존의 우선순위 계획(PP)을 기반으로 합니다. 동적 우선순위 할당을 부분 관측 마르코프 결정 프로세스(POMDP)로 공식화함으로써, RL-RH-PP는 수명 주기 계획의 순차적 의사 결정 특성을 활용하는 동시에, 에이전트 간의 복잡한 공간-시간 상호 작용을 강화 학습에 위임합니다. 주의 메커니즘 기반의 신경망은 우선순위 순서를 실시간으로 예측하여, PP 플래너가 효율적인 순차적 단일 에이전트 계획을 수행할 수 있도록 합니다. 실제 창고 시뮬레이션에서의 평가 결과, RL-RH-PP는 기준 모델보다 가장 높은 총 처리량을 달성했으며, 에이전트 밀도, 계획 수평 및 창고 레이아웃에 걸쳐 효과적으로 일반화됩니다. 해석적 분석 결과, RL-RH-PP는 혼잡한 에이전트에 우선순위를 부여하고, 에이전트를 전략적으로 혼잡 지역에서 분산시켜 교통 흐름을 원활하게 하고 처리량을 향상시킵니다. 이러한 결과는 학습 기반 접근 방식이 현대 창고 자동화에서 기존 휴리스틱을 향상시킬 수 있는 잠재력을 보여줍니다.

Original Abstract

Lifelong Multi-Agent Path Finding (MAPF) is critical for modern warehouse automation, which requires multiple robots to continuously navigate conflict-free paths to optimize the overall system throughput. However, the complexity of warehouse environments and the long-term dynamics of lifelong MAPF often demand costly adaptations to classical search-based solvers. While machine learning methods have been explored, their superiority over search-based methods remains inconclusive. In this paper, we introduce Reinforcement Learning (RL) guided Rolling Horizon Prioritized Planning (RL-RH-PP), the first framework integrating RL with search-based planning for lifelong MAPF. Specifically, we leverage classical Prioritized Planning (PP) as a backbone for its simplicity and flexibility in integrating with a learning-based priority assignment policy. By formulating dynamic priority assignment as a Partially Observable Markov Decision Process (POMDP), RL-RH-PP exploits the sequential decision-making nature of lifelong planning while delegating complex spatial-temporal interactions among agents to reinforcement learning. An attention-based neural network autoregressively decodes priority orders on-the-fly, enabling efficient sequential single-agent planning by the PP planner. Evaluations in realistic warehouse simulations show that RL-RH-PP achieves the highest total throughput among baselines and generalizes effectively across agent densities, planning horizons, and warehouse layouts. Our interpretive analyses reveal that RL-RH-PP proactively prioritizes congested agents and strategically redirects agents from congestion, easing traffic flow and boosting throughput. These findings highlight the potential of learning-guided approaches to augment traditional heuristics in modern warehouse automation.

1 Citations

0 Influential

6.5 Altmetric

33.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!