2601.07577v1 Jan 12, 2026 cs.AI

얽힌 계획 수립을 넘어서: 장기 호라이즌 에이전트를 위한 작업 분리형 계획 수립

Beyond Entangled Planning: Task-Decoupled Planning for Long-Horizon Agents

Bingbing Xu

Citations: 1,223

h-index: 12

Xiucheng Xu

Citations: 7

h-index: 2

Huawei Shen

Citations: 51

h-index: 4

Yunfan Li

Citations: 15

h-index: 3

Xueyun Tian

Citations: 39

h-index: 3

최근 대규모 언어 모델(LLM)의 발전으로 에이전트가 복잡하고 장기적인 작업을 자율적으로 수행할 수 있게 되었으나, 신뢰할 수 있는 작업 실행에 있어 계획 수립은 여전히 주요 병목 현상으로 남아 있습니다. 기존 방법론은 일반적으로 반응적이지만 종종 근시안적인 단계별 계획과, 사전에 전체 계획을 생성하지만 실행 오류에 취약한 원샷 계획이라는 두 가지 패러다임으로 나뉩니다. 결정적으로, 이 두 패러다임 모두 에이전트가 여러 하위 작업에 걸친 거대한 단일 히스토리를 바탕으로 추론해야 하는 '얽힌 맥락(entangled contexts)' 문제를 겪습니다. 이러한 얽힘은 인지 부하를 증가시키고, 본래 독립적이어야 할 결정들 사이로 국소적 오류가 전파되게 하여 복구에 많은 계산 비용이 소요되게 만듭니다. 이를 해결하기 위해, 우리는 얽힌 추론을 작업 분리로 대체하는 훈련이 필요 없는 프레임워크인 '작업 분리형 계획(Task-Decoupled Planning, TDP)'을 제안합니다. TDP는 슈퍼바이저를 통해 작업을 하위 목표들의 방향성 비순환 그래프(DAG)로 분해합니다. 범위가 지정된 맥락을 가진 계획자와 실행자를 사용하여, TDP는 추론과 재계획을 현재 진행 중인 하위 작업으로 국한합니다. 이러한 격리는 오류 전파를 방지하고 전체 워크플로를 방해하지 않으면서 국소적으로 편차를 수정합니다. TravelPlanner, ScienceWorld, HotpotQA에서의 실험 결과, TDP는 강력한 기준 모델들보다 뛰어난 성능을 보였으며 토큰 소비량을 최대 82%까지 줄였습니다. 이는 하위 작업 분리가 장기 호라이즌 에이전트의 견고성과 효율성 모두를 향상시킨다는 것을 입증합니다.

Original Abstract

Recent advances in large language models (LLMs) have enabled agents to autonomously execute complex, long-horizon tasks, yet planning remains a primary bottleneck for reliable task execution. Existing methods typically fall into two paradigms: step-wise planning, which is reactive but often short-sighted; and one-shot planning, which generates a complete plan upfront yet is brittle to execution errors. Crucially, both paradigms suffer from entangled contexts, where the agent must reason over a monolithic history spanning multiple sub-tasks. This entanglement increases cognitive load and lets local errors propagate across otherwise independent decisions, making recovery computationally expensive. To address this, we propose Task-Decoupled Planning (TDP), a training-free framework that replaces entangled reasoning with task decoupling. TDP decomposes tasks into a directed acyclic graph (DAG) of sub-goals via a Supervisor. Using a Planner and Executor with scoped contexts, TDP confines reasoning and replanning to the active sub-task. This isolation prevents error propagation and corrects deviations locally without disrupting the workflow. Results on TravelPlanner, ScienceWorld, and HotpotQA show that TDP outperforms strong baselines while reducing token consumption by up to 82%, demonstrating that sub-task decoupling improves both robustness and efficiency for long-horizon agents.

3 Citations

1 Influential

6 Altmetric

35.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!