2604.16871v1 Apr 18, 2026 cs.AI

GRAIL: 신경-기호 강화 학습을 위한 자율적인 개념 연관 학습

GRAIL: Autonomous Concept Grounding for Neuro-Symbolic Reinforcement Learning

Kristian Kersting

Citations: 95

h-index: 5

Quentin Delfosse

Citations: 359

h-index: 11

Hikaru Shindo

Citations: 299

h-index: 9

Henri Rößler

Citations: 0

h-index: 0

신경-기호 강화 학습(NeSy-RL)은 해석 가능하고 일반화 가능한 정책을 달성하기 위해 기호 추론과 그래디언트 기반 최적화를 결합합니다. '왼쪽', '가까이'와 같은 관계 기반 개념은 에이전트가 인식하고 행동하는 방식을 구조화하는 기본적인 구성 요소 역할을 합니다. 그러나 기존 접근 방식은 이러한 개념을 수동으로 정의하기 위해 인간 전문가가 필요하며, 이는 개념의 의미가 환경에 따라 달라지기 때문에 적응성을 제한합니다. 본 논문에서는 환경과의 상호 작용을 통해 관계 기반 개념을 자율적으로 학습하는 프레임워크인 GRAIL(Grounding Relational Agents through Interactive Learning)을 제안합니다. GRAIL은 대규모 언어 모델(LLM)을 활용하여 일반적인 개념 표현을 약한 형태의 지도 학습으로 제공하고, 이를 환경별 특이적인 의미를 반영하도록 개선합니다. 이러한 접근 방식은 결정되지 않은 환경에서 발생하는 희소한 보상 신호와 개념 불일치 문제를 해결합니다. Kangaroo, Seaquest, Skiing Atari 게임에서의 실험 결과, GRAIL은 단순화된 환경에서 수동으로 정의된 개념을 사용한 에이전트와 동등하거나 더 나은 성능을 보이며, 전체 환경에서 보상 극대화와 고수준 목표 달성 간의 유용한 상호작용을 보여줍니다.

Original Abstract

Neuro-symbolic Reinforcement Learning (NeSy-RL) combines symbolic reasoning with gradient-based optimization to achieve interpretable and generalizable policies. Relational concepts, such as "left of" or "close by", serve as foundational building blocks that structure how agents perceive and act. However, conventional approaches require human experts to manually define these concepts, limiting adaptability since concept semantics vary across environments. We propose GRAIL (Grounding Relational Agents through Interactive Learning), a framework that autonomously grounds relational concepts through environmental interaction. GRAIL leverages large language models (LLMs) to provide generic concept representations as weak supervision, then refines them to capture environment-specific semantics. This approach addresses both sparse reward signals and concept misalignment prevalent in underdetermined environments. Experiments on the Atari games Kangaroo, Seaquest, and Skiing demonstrate that GRAIL matches or outperforms agents with manually crafted concepts in simplified settings, and reveals informative trade-offs between reward maximization and high-level goal completion in the full environment.

0 Citations

0 Influential

5.5 Altmetric

27.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!