2604.13891v1 Apr 15, 2026 cs.RO

다중 에이전트 환경에서의 보수적인 자율 주행 기술을 넘어선, 결합된 모델 예측 제어 및 심층 강화 학습

Beyond Conservative Automated Driving in Multi-Agent Scenarios via Coupled Model Predictive Control and Deep Reinforcement Learning

Simeon C. Calvert

Citations: 28

h-index: 3

B. Arem

Citations: 10,069

h-index: 44

Saeed Rahmani

Citations: 20

h-index: 3

B. Brito

Citations: 802

h-index: 15

Z. Xu

Citations: 26

h-index: 3

Gözde Körpe

Citations: 0

h-index: 0

신호등이 없는 교차로에서의 자율 주행은 복잡한 다중 차량 상호 작용과 안전 및 효율성의 균형 요구로 인해 어려운 과제입니다. 모델 예측 제어(MPC)는 최적화를 통한 체계적인 제약 조건 처리를 제공하지만, 종종 과도하게 보수적인 동작을 유발하는 수동으로 설계된 규칙에 의존합니다. 심층 강화 학습(RL)은 경험으로부터 적응적인 행동을 학습하지만, 종종 안전 보장 및 미지의 환경으로의 일반화에 어려움을 겪습니다. 본 연구에서는 다중 에이전트 환경에서 내비게이션 성능을 향상시키기 위한 통합된 MPC-RL 프레임워크를 제시합니다. 실험 결과, MPC-RL은 세 가지 교통 밀도 수준에서 독립적인 MPC 및 엔드-투-엔드 RL보다 우수한 성능을 보였습니다. 전반적으로, MPC-RL은 순수한 MPC에 비해 충돌률을 21% 감소시키고 성공률을 6.5% 향상시켰습니다. 또한, 재학습 없이 고속 합류 시나리오로의 제로샷 전송을 평가했습니다. MPC 기반 방법은 엔드-투-엔드 PPO보다 훨씬 더 우수한 전송 성능을 보였으며, 이는 다양한 주행 환경에서의 강건성을 위한 MPC의 중요한 역할을 강조합니다. 또한, 본 프레임워크는 엔드-투-엔드 RL에 비해 더 빠른 손실 안정화를 보여주어 학습 부담을 줄이는 것을 나타냅니다. 이러한 결과는 통합 접근 방식이 다중 에이전트 교차로 시나리오에서 안전 성능과 효율성 간의 균형을 개선할 수 있으며, MPC 구성 요소는 다양한 주행 환경에 대한 일반화를 위한 강력한 기반을 제공한다는 것을 시사합니다. 구현 코드는 오픈 소스로 제공됩니다.

Original Abstract

Automated driving at unsignalized intersections is challenging due to complex multi-vehicle interactions and the need to balance safety and efficiency. Model Predictive Control (MPC) offers structured constraint handling through optimization but relies on hand-crafted rules that often produce overly conservative behavior. Deep Reinforcement Learning (RL) learns adaptive behaviors from experience but often struggles with safety assurance and generalization to unseen environments. In this study, we present an integrated MPC-RL framework to improve navigation performance in multi-agent scenarios. Experiments show that MPC-RL outperforms standalone MPC and end-to-end RL across three traffic-density levels. Collectively, MPC-RL reduces the collision rate by 21% and improves the success rate by 6.5% compared to pure MPC. We further evaluate zero-shot transfer to a highway merging scenario without retraining. Both MPC-based methods transfer substantially better than end-to-end PPO, which highlights the role of the MPC backbone in cross-scenario robustness. The framework also shows faster loss stabilization than end-to-end RL during training, which indicates a reduced learning burden. These results suggest that the integrated approach can improve the balance between safety performance and efficiency in multi-agent intersection scenarios, while the MPC component provides a strong foundation for generalization across driving environments. The implementation code is available open-source.

0 Citations

0 Influential

22 Altmetric

110.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!