2603.01488v1 Mar 02, 2026 cs.AI

LLM 기반 의미론적 옵션 발견: 적응형 딥 강화 학습을 위한 방법

LLM-assisted Semantic Option Discovery for Facilitating Adaptive Deep Reinforcement Learning

Chang Yao

Citations: 11

h-index: 2

Kebing Jin

Citations: 129

h-index: 5

H. Zhuo

Citations: 416

h-index: 10

Jing Qin

Citations: 14

h-index: 2

딥 강화 학습(DRL)은 복잡한 작업에서 놀라운 성공을 거두었지만, 실제 적용에서는 데이터 효율성 부족, 해석 가능성 부족, 제한적인 환경 간 이식성 등 심각한 문제점을 안고 있습니다. 또한, 상태에 기반하여 행동을 생성하는 학습된 정책은 환경 변화에 민감하여, 안전하고 규정을 준수하는 행동을 보장하기 어렵습니다. 최근 연구에서는 대규모 언어 모델(LLM)을 심볼릭 플래닝과 통합하는 것이 이러한 문제 해결에 유망하다는 것을 보여주었습니다. 본 연구는 이러한 점에 착안하여, 자연어 지시 사항을 실행 가능한 규칙으로 매핑하고 자동으로 생성된 옵션에 의미론적 주석을 추가함으로써, 의미론 기반의 기술 재사용 및 실시간 제약 조건 모니터링을 가능하게 하는 새로운 LLM 기반 폐루프 프레임워크를 제안합니다. 제안하는 접근 방식은 LLM의 일반적인 지식을 활용하여 탐색 효율성을 높이고, 유사한 환경에 적응 가능한 옵션을 제공하며, 의미론적 주석을 통해 내재적인 해석 가능성을 제공합니다. 제안하는 프레임워크의 효과를 검증하기 위해, 오피스 월드(Office World)와 몬테즈마의 복수(Montezuma's Revenge)라는 두 가지 환경에서 실험을 수행했습니다. 실험 결과는 데이터 효율성, 제약 조건 준수, 그리고 작업 간 이식성 측면에서 우수한 성능을 보여주었습니다.

Original Abstract

Despite achieving remarkable success in complex tasks, Deep Reinforcement Learning (DRL) is still suffering from critical issues in practical applications, such as low data efficiency, lack of interpretability, and limited cross-environment transferability. However, the learned policy generating actions based on states are sensitive to the environmental changes, struggling to guarantee behavioral safety and compliance. Recent research shows that integrating Large Language Models (LLMs) with symbolic planning is promising in addressing these challenges. Inspired by this, we introduce a novel LLM-driven closed-loop framework, which enables semantic-driven skill reuse and real-time constraint monitoring by mapping natural language instructions into executable rules and semantically annotating automatically created options. The proposed approach utilizes the general knowledge of LLMs to facilitate exploration efficiency and adapt to transferable options for similar environments, and provides inherent interpretability through semantic annotations. To validate the effectiveness of this framework, we conduct experiments on two domains, Office World and Montezuma's Revenge, respectively. The results demonstrate superior performance in data efficiency, constraint compliance, and cross-task transferability.

0 Citations

0 Influential

5 Altmetric

25.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!