2602.06525v2 Feb 06, 2026 cs.AI

행동 트리 기반 강화 학습에서의 진행 제약 조건

Progress Constraints for Reinforcement Learning in Behavior Trees

Finn Rietz

Citations: 65

h-index: 3

J. A. Stork

Citations: 1,797

h-index: 23

Mart Kartašev

Citations: 27

h-index: 3

Petter Ögren

Citations: 87

h-index: 2

행동 트리(BT)는 의사 결정을 위한 구조화되고 반응적인 프레임워크를 제공하며, 주로 환경 조건에 따라 하위 제어기 간 전환에 사용됩니다. 반면, 강화 학습(RL)은 최적에 가까운 제어기를 학습할 수 있지만, 때로는 희소한 보상, 안전한 탐색, 그리고 장기적인 보상 분배 문제에 어려움을 겪습니다. BT와 RL을 결합하면 상호 이점을 얻을 수 있습니다. BT 설계는 RL 훈련을 단순화할 수 있는 구조화된 도메인 지식을 인코딩하고, RL은 BT 내의 제어기를 자동으로 학습할 수 있도록 합니다. 그러나 BT와 RL의 단순한 통합은 일부 제어기가 다른 제어기를 상쇄하여 이전에 달성한 하위 목표를 무효화하고 전체 성능을 저하시킬 수 있습니다. 이러한 문제를 해결하기 위해, 우리는 진행 제약 조건을 제안합니다. 이는 타당성 추정기가 이론적인 BT 수렴 결과를 기반으로 허용 가능한 행동 집합을 제한하는 새로운 메커니즘입니다. 2D 개념 증명 및 고정밀 창고 환경에서의 실험적 평가 결과, 제안된 방법은 기존의 BT-RL 통합 방법보다 향상된 성능, 샘플 효율성 및 제약 조건 만족도를 보여주었습니다.

Original Abstract

Behavior Trees (BTs) provide a structured and reactive framework for decision-making, commonly used to switch between sub-controllers based on environmental conditions. Reinforcement Learning (RL), on the other hand, can learn near-optimal controllers but sometimes struggles with sparse rewards, safe exploration, and long-horizon credit assignment. Combining BTs with RL has the potential for mutual benefit: a BT design encodes structured domain knowledge that can simplify RL training, while RL enables automatic learning of the controllers within BTs. However, naive integration of BTs and RL can lead to some controllers counteracting other controllers, possibly undoing previously achieved subgoals, thereby degrading the overall performance. To address this, we propose progress constraints, a novel mechanism where feasibility estimators constrain the allowed action set based on theoretical BT convergence results. Empirical evaluations in a 2D proof-of-concept and a high-fidelity warehouse environment demonstrate improved performance, sample efficiency, and constraint satisfaction, compared to prior methods of BT-RL integration.

0 Citations

0 Influential

11.5 Altmetric

57.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!