2602.02455v1 Feb 02, 2026 cs.AI

Drift-Bench: 다중 턴 상호작용을 통한 입력 결함 상황에서의 LLM 에이전트 협력 실패 진단

Drift-Bench: Diagnosing Cooperative Breakdowns in LLM Agents under Input Faults via Multi-Turn Interaction

Han Bao

Citations: 8,482

h-index: 4

Zheyuan Zhang

Citations: 348

h-index: 11

Kaiwen Shi

Citations: 27

h-index: 3

Pengcheng Jing

Citations: 28

h-index: 3

Zhengqing Yuan

Citations: 884

h-index: 9

Yanfang Ye

Citations: 451

h-index: 5

대규모 언어 모델(LLM)이 자율 에이전트로 전환됨에 따라, 사용자 입력은 빈번하게 협력적 가정(예: 내재된 의도, 누락된 매개변수, 잘못된 전제 또는 모호한 표현)을 위반하며, 이는 텍스트 전용 평가로는 포착할 수 없는 실행 위험을 초래합니다. 기존 벤치마크들은 일반적으로 명확하게 명시된 지시 사항을 가정하거나 텍스트 전용의 단일 턴 명확화(clarification)로 평가를 제한하므로, 실제 실행 위험이 있는 상황에서의 다중 턴 모호성 해소 능력을 측정하지 못합니다. 우리는 상태 지향 및 서비스 지향 실행 환경 전반에서 다중 턴 명확화 과정을 통해 입력 결함 상황에서의 에이전트 화용론(agentic pragmatics)을 평가하는 최초의 진단 벤치마크인 Drift-Bench를 소개합니다. 고전적 의사소통 이론에 기반을 둔 Drift-Bench는 협력 실패에 대한 통합된 분류 체계를 제공하며, Rise 평가 프로토콜과 함께 페르소나 주도형 사용자 시뮬레이터를 사용합니다. 실험 결과, 이러한 결함 상황에서 상당한 성능 저하가 나타났으며, 사용자 페르소나와 결함 유형에 따라 명확화의 효과가 달라지는 것으로 확인되었습니다. Drift-Bench는 명확화 연구와 에이전트 안전성 평가를 연결하여, 안전하지 않은 실행으로 이어질 수 있는 실패를 체계적으로 진단할 수 있게 합니다.

Original Abstract

As Large Language Models transition to autonomous agents, user inputs frequently violate cooperative assumptions (e.g., implicit intent, missing parameters, false presuppositions, or ambiguous expressions), creating execution risks that text-only evaluations do not capture. Existing benchmarks typically assume well-specified instructions or restrict evaluation to text-only, single-turn clarification, and thus do not measure multi-turn disambiguation under grounded execution risk. We introduce \textbf{Drift-Bench}, the first diagnostic benchmark that evaluates agentic pragmatics under input faults through multi-turn clarification across state-oriented and service-oriented execution environments. Grounded in classical theories of communication, \textbf{Drift-Bench} provides a unified taxonomy of cooperative breakdowns and employs a persona-driven user simulator with the \textbf{Rise} evaluation protocol. Experiments show substantial performance drops under these faults, with clarification effectiveness varying across user personas and fault types. \MethodName bridges clarification research and agent safety evaluation, enabling systematic diagnosis of failures that can lead to unsafe executions.

2 Citations

0 Influential

5.5 Altmetric

29.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!