2604.04808v1 Apr 06, 2026 cs.LG

강화 학습에서 의사 결정과 관련된 개념 선택

Selecting Decision-Relevant Concepts in Reinforcement Learning

Stephanie Milani

Citations: 810

h-index: 14

Naveen Raman

Citations: 39

h-index: 3

Fei Fang

Citations: 112

h-index: 4

해석 가능한 개념 기반 정책을 학습하려면, 연구자가 에이전트가 순차적인 의사 결정을 내릴 때 어떤 인간이 이해할 수 있는 개념을 사용할지를 수동으로 선택해야 합니다. 이러한 선택은 전문 지식을 요구하며, 시간이 오래 걸리고 비용이 많이 들며, 후보의 수에 따라 성능이 저하될 수 있고, 성능에 대한 보장을 제공하지 않습니다. 이러한 제한 사항을 극복하기 위해, 우리는 순차적인 의사 결정에서 체계적인 자동 개념 선택을 위한 최초의 알고리즘을 제안합니다. 우리의 핵심 아이디어는 개념 선택을 상태 추상화의 관점에서 바라보는 것입니다. 직관적으로, 어떤 개념이 의사 결정과 관련이 있다면, 해당 개념을 제거하면 에이전트가 서로 다른 행동을 요구하는 상태를 혼동하게 됩니다. 따라서, 에이전트는 의사 결정과 관련된 개념에 의존해야 하며, 동일한 개념 표현을 가진 상태는 동일한 최적 행동을 공유해야 합니다. 이는 원래 상태 공간의 최적 의사 결정 구조를 유지합니다. 이러한 관점은 의사 결정 관련 선택(Decision-Relevant Selection, DRS) 알고리즘으로 이어지며, DRS는 후보 집합에서 개념의 부분 집합을 선택하고, 선택된 개념과 결과 정책의 성능 사이의 관계를 나타내는 성능 경계를 제공합니다. 경험적으로, DRS는 수동으로 구성된 개념 집합을 자동으로 복원하면서, 해당 성능을 능가하거나 일치시키고, 강화 학습 벤치마크 및 실제 의료 환경에서 테스트 시점의 개념 개입 효과를 향상시킵니다.

Original Abstract

Training interpretable concept-based policies requires practitioners to manually select which human-understandable concepts an agent should reason with when making sequential decisions. This selection demands domain expertise, is time-consuming and costly, scales poorly with the number of candidates, and provides no performance guarantees. To overcome this limitation, we propose the first algorithms for principled automatic concept selection in sequential decision-making. Our key insight is that concept selection can be viewed through the lens of state abstraction: intuitively, a concept is decision-relevant if removing it would cause the agent to confuse states that require different actions. As a result, agents should rely on decision-relevant concepts; states with the same concept representation should share the same optimal action, which preserves the optimal decision structure of the original state space. This perspective leads to the Decision-Relevant Selection (DRS) algorithm, which selects a subset of concepts from a candidate set, along with performance bounds relating the selected concepts to the performance of the resulting policy. Empirically, DRS automatically recovers manually curated concept sets while matching or exceeding their performance, and improves the effectiveness of test-time concept interventions across reinforcement learning benchmarks and real-world healthcare environments.

0 Citations

0 Influential

7 Altmetric

35.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!