2602.10525v1 Feb 11, 2026 cs.CL

LHAW: 제어 가능한 불완전성: 장기 작업 환경을 위한 방법론

LHAW: Controllable Underspecification for Long-Horizon Tasks

Michael S. Lee

Citations: 132

h-index: 5

Udari Madhushani Sehwag

Citations: 266

h-index: 6

D. J. Lee

Citations: 1

h-index: 1

Brian Zhu

Citations: 51

h-index: 4

Yash Maurya

Citations: 234

h-index: 5

M. Raghavendra

Citations: 19

h-index: 3

Sam Denton

Citations: 27

h-index: 2

George Pu

Citations: 376

h-index: 3

Y. Xue

Citations: 95

h-index: 2

장기적인 관점에서 효과적으로 작동하는 워크플로우 에이전트는 진정으로 자율적인 시스템을 구축하는 데 필수적입니다. 이러한 에이전트의 안정적인 작동은 모호한 상황에서 명확화를 통해 정확한 작업 수행을 보장하는 능력에 크게 의존합니다. 그러나 확장 가능하고 작업에 구애받지 않는 프레임워크가 부족하여 체계적으로 모호성의 영향을 관리하고 측정하는 데 어려움이 있었습니다. 본 연구는 이러한 격차를 해결하기 위해 LHAW(Long-Horizon Augmented Workflows)를 제안합니다. LHAW는 모듈화되고 데이터셋에 독립적인 합성 파이프라인으로, 목표, 제약 조건, 입력, 그리고 맥락의 네 가지 차원에서 구성 가능한 수준으로 정보를 체계적으로 제거하여, 어떤 잘 정의된 작업도 제어 가능한 불완전한 형태로 변환합니다. 기존의 LLM 예측 방식과 달리, LHAW는 실제 에이전트 실험을 통해 변형된 버전을 검증하며, 관찰된 최종 상태의 차이를 기반으로 결과를 '결과에 중요한', '다양한', 또는 '무해한'으로 분류합니다. 본 연구에서는 TheAgentCompany, SWE-Bench Pro, 그리고 MCP-Atlas 데이터셋에서 생성된 285개의 작업 변형을 제시하며, 현재 에이전트가 모호한 환경에서 불완전성을 어떻게 감지하고, 추론하며, 해결하는지에 대한 공식적인 분석 결과를 함께 제공합니다. LHAW는 장기적인 환경에서 에이전트의 명확화 행동을 비용 효율적으로 평가할 수 있는 첫 번째 체계적인 프레임워크를 제공하며, 이를 통해 신뢰할 수 있는 자율 시스템 개발을 지원합니다.

Original Abstract

Long-horizon workflow agents that operate effectively over extended periods are essential for truly autonomous systems. Their reliable execution critically depends on the ability to reason through ambiguous situations in which clarification seeking is necessary to ensure correct task execution. However, progress is limited by the lack of scalable, task-agnostic frameworks for systematically curating and measuring the impact of ambiguity across custom workflows. We address this gap by introducing LHAW (Long-Horizon Augmented Workflows), a modular, dataset-agnostic synthetic pipeline that transforms any well-specified task into controllable underspecified variants by systematically removing information across four dimensions - Goals, Constraints, Inputs, and Context - at configurable severity levels. Unlike approaches that rely on LLM predictions of ambiguity, LHAW validates variants through empirical agent trials, classifying them as outcome-critical, divergent, or benign based on observed terminal state divergence. We release 285 task variants from TheAgentCompany, SWE-Bench Pro and MCP-Atlas according to our taxonomy alongside formal analysis measuring how current agents detect, reason about, and resolve underspecification across ambiguous settings. LHAW provides the first systematic framework for cost-sensitive evaluation of agent clarification behavior in long-horizon settings, enabling development of reliable autonomous systems.

1 Citations

0 Influential

3 Altmetric

16.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!