2604.09443v1 Apr 10, 2026 cs.CL

LLM 에이전트 시스템 내 다계층 명령 구조

Many-Tier Instruction Hierarchy in LLM Agents

William Jurayj

Johns Hopkins University

Citations: 46

h-index: 4

Benjamin Van Durme

Citations: 1,013

h-index: 17

Daniel Khashabi

Citations: 262

h-index: 8

Tianjian Li

Citations: 125

h-index: 5

Jingyu Zhang

Citations: 16

h-index: 2

Hongyuan Zhan

Citations: 182

h-index: 5

대규모 언어 모델(LLM) 에이전트는 시스템 메시지, 사용자 프롬프트, 도구 출력 등 다양한 출처로부터 명령을 받으며, 각 명령은 신뢰도와 권한 수준이 다릅니다. 이러한 명령 간에 충돌이 발생할 때, 모델은 안전하고 효과적인 운영을 위해 가장 높은 권한을 가진 명령을 반드시 준수해야 합니다. 현재 지배적인 패러다임인 명령 구조(Instruction Hierarchy, IH)는 일반적으로 고정된 작은 수의 권한 수준(대개 5개 미만)을 가정하며, 이는 엄격한 역할 레이블(예: 시스템 > 사용자)로 정의됩니다. 그러나 이는 실제 에이전트 환경에서 발생할 수 있는 훨씬 더 다양한 출처 및 상황에서의 충돌을 해결하기에는 부족합니다. 본 연구에서는 임의의 수의 권한 수준을 가진 명령 간의 충돌을 해결하기 위한 패러다임인 다계층 명령 구조(Many-Tier Instruction Hierarchy, ManyIH)를 제안합니다. 또한, ManyIH를 평가하기 위한 최초의 벤치마크인 ManyIH-Bench를 소개합니다. ManyIH-Bench는 모델이 최대 12개의 충돌하는 명령을 다양한 권한 수준으로 처리해야 하며, 853개의 에이전트 관련 작업(427개의 코딩 작업 및 426개의 명령 준수 작업)으로 구성됩니다. ManyIH-Bench는 LLM에서 개발하고 인간이 검증한 제약 조건을 사용하여 46개의 실제 에이전트를 대상으로 하는 현실적이고 어려운 테스트 케이스를 생성합니다. 실험 결과, 현재 최첨단 모델조차도 명령 충돌의 규모가 커질수록 성능이 저하되는 경향을 보였습니다(약 40%의 정확도). 본 연구는 에이전트 환경에서 정밀하고 확장 가능한 명령 충돌 해결을 위한 방법론의 개발이 시급함을 강조합니다.

Original Abstract

Large language model agents receive instructions from many sources-system messages, user prompts, tool outputs, and more-each carrying different levels of trust and authority. When these instructions conflict, models must reliably follow the highest-privilege instruction to remain safe and effective. The dominant paradigm, instruction hierarchy (IH), assumes a fixed, small set of privilege levels (typically fewer than five) defined by rigid role labels (e.g., system > user). This is inadequate for real-world agentic settings, where conflicts can arise across far more sources and contexts. In this work, we propose Many-Tier Instruction Hierarchy (ManyIH), a paradigm for resolving instruction conflicts among instructions with arbitrarily many privilege levels. We introduce ManyIH-Bench, the first benchmark for ManyIH. ManyIH-Bench requires models to navigate up to 12 levels of conflicting instructions with varying privileges, comprising 853 agentic tasks (427 coding and 426 instruction-following). ManyIH-Bench composes constraints developed by LLMs and verified by humans to create realistic and difficult test cases spanning 46 real-world agents. Our experiments show that even the current frontier models perform poorly (~40% accuracy) when instruction conflict scales. This work underscores the urgent need for methods that explicitly target fine-grained, scalable instruction conflict resolution in agentic settings.

0 Citations

0 Influential

8.5 Altmetric

42.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!