2602.14252v1 Feb 15, 2026 cs.AI

GRAIL: 모방 학습을 통한 목표 인식 정렬

GRAIL: Goal Recognition Alignment through Imitation Learning

Osher Elhadad

Citations: 9

h-index: 2

Felipe Meneguzzi

Citations: 194

h-index: 6

Reuth Mirsky

Citations: 654

h-index: 16

에이전트의 행동을 통해 목표를 이해하는 것은 AI 시스템을 인간의 의도와 정렬하는 데 있어 근본적인 과제이다. 기존의 목표 인식 방법들은 대개 최적 목표 지향 정책(optimal goal-oriented policy) 표현에 의존하지만, 이는 행위자의 실제 행동과 차이가 있을 수 있어 정확한 목표 인식을 저해할 수 있다. 이러한 격차를 해소하기 위해 본 논문은 모방 학습과 역강화 학습을 활용하여 (비최적일 수 있는) 시연 궤적으로부터 각 후보 목표에 대한 목표 지향 정책을 직접 학습하는 '모방 학습을 통한 목표 인식 정렬(GRAIL)'을 소개한다. GRAIL은 단일 순전파(single forward pass) 내에서 학습된 각 목표 지향 정책을 통해 관측된 부분 궤적을 평가함으로써, 고전적 목표 인식의 원샷 추론 능력을 유지하는 동시에 비최적이고 체계적으로 편향된 행동을 포착할 수 있는 학습된 정책을 활용한다. 평가된 도메인 전반에서 GRAIL은 체계적으로 편향된 최적 행동 하에서 F1 점수를 0.5 이상 높였고, 비최적 행동 하에서는 약 0.1~0.3의 이득을 얻었으며, 잡음이 있는 최적 궤적 하에서는 최대 0.4의 향상을 보였다. 또한 완전 최적 설정에서도 여전히 경쟁력을 유지했다. 이 연구는 불확실한 환경에서 에이전트의 목표를 해석하기 위한 확장 가능하고 견고한 모델에 기여한다.

Original Abstract

Understanding an agent's goals from its behavior is fundamental to aligning AI systems with human intentions. Existing goal recognition methods typically rely on an optimal goal-oriented policy representation, which may differ from the actor's true behavior and hinder the accurate recognition of their goal. To address this gap, this paper introduces Goal Recognition Alignment through Imitation Learning (GRAIL), which leverages imitation learning and inverse reinforcement learning to learn one goal-directed policy for each candidate goal directly from (potentially suboptimal) demonstration trajectories. By scoring an observed partial trajectory with each learned goal-directed policy in a single forward pass, GRAIL retains the one-shot inference capability of classical goal recognition while leveraging learned policies that can capture suboptimal and systematically biased behavior. Across the evaluated domains, GRAIL increases the F1-score by more than 0.5 under systematically biased optimal behavior, achieves gains of approximately 0.1-0.3 under suboptimal behavior, and yields improvements of up to 0.4 under noisy optimal trajectories, while remaining competitive in fully optimal settings. This work contributes toward scalable and robust models for interpreting agent goals in uncertain environments.

1 Citations

0 Influential

8 Altmetric

41.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!