2603.20994v1 Mar 22, 2026 cs.AI

지능형 불복종 게임: 스태클버그 게임과 마르코프 의사 결정 과정에서의 불복종 모델링

The Intelligent Disobedience Game: Formulating Disobedience in Stackelberg Games and Markov Decision Processes

Citations: 654

h-index: 16

Citations: 3

h-index: 1

공유 자율 시스템에서, 자동화된 어시스턴트가 인간의 지시를 따를지, 아니면 잠재적인 피해를 막기 위해 의도적으로 이를 무시할지 선택해야 하는 중요한 문제가 발생합니다. 이러한 안전 관련 행위를 '지능형 불복종(intelligent disobedience)'이라고 합니다. 본 논문에서는 이러한 역동적인 관계를 형식화하기 위해, 스태클버그 게임을 기반으로 한 순차적 게임 이론 프레임워크인 '지능형 불복종 게임(IDG)'을 소개합니다. IDG는 비대칭 정보 환경에서 인간 리더와 어시스턴트 팔로워 간의 상호작용을 모델링합니다. 이 모델은 다단계 시나리오에서 양쪽 에이전트의 최적 전략을 특징짓고, 시스템이 잠재적으로 피해를 회피하지만 인간의 목표를 달성하지 못하는 '안전 함정(safety traps)'과 같은 전략적 현상을 식별합니다. IDG는 안전한 비협조 행동을 학습할 수 있는 에이전트의 알고리즘 개발과 인간이 불복종하는 AI를 어떻게 인식하고 신뢰하는지에 대한 실증적 연구를 가능하게 하는 필수적인 수학적 기반을 제공합니다. 또한, 본 논문에서는 IDG를 공유 제어 다중 에이전트 마르코프 의사 결정 과정(Multi-Agent Markov Decision Process)으로 변환하여 강화 학습 에이전트 훈련을 위한 효율적인 계산 환경을 구축합니다.

Original Abstract

In shared autonomy, a critical tension arises when an automated assistant must choose between obeying a human's instruction and deliberately overriding it to prevent harm. This safety-critical behavior is known as intelligent disobedience. To formalize this dynamic, this paper introduces the Intelligent Disobedience Game (IDG), a sequential game-theoretic framework based on Stackelberg games that models the interaction between a human leader and an assistive follower operating under asymmetric information. It characterizes optimal strategies for both agents across multi-step scenarios, identifying strategic phenomena such as ``safety traps,'' where the system indefinitely avoids harm but fails to achieve the human's goal. The IDG provides a needed mathematical foundation that enables both the algorithmic development of agents that can learn safe non-compliance and the empirical study of how humans perceive and trust disobedient AI. The paper further translates the IDG into a shared control Multi-Agent Markov Decision Process representation, forming a compact computational testbed for training reinforcement learning agents.

0 Citations

0 Influential

8 Altmetric

40.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!