2602.19633v1 Feb 23, 2026 cs.AI

TAPE: 언어 모델 에이전트에서의 도구 기반 적응형 계획 및 제약적 실행

TAPE: Tool-Guided Adaptive Planning and Constrained Execution in Language Model Agents

Jongwon Jeong

Citations: 72

h-index: 4

Kangwook Lee

Citations: 150

h-index: 4

Jungtaek Kim

University of Wisconsin–Madison

Citations: 767

h-index: 13

언어 모델(LM) 에이전트는 환경과의 다중 상호작용이 요구되는 작업을 해결하는 데 놀라운 능력을 보여주었다. 그러나 엄격한 실행 가능성 제약 조건 아래에서 단 한 번의 오류가 종종 돌이킬 수 없는 실패로 이어지는 환경에서는 여전히 취약성을 드러낸다. 우리는 기존 에이전트 프레임워크를 체계적으로 분석하여, 불완전한 계획 수립과 확률적 실행이 이러한 취약성의 주요 원인임을 식별하였다. 이러한 문제를 해결하기 위해 우리는 제약적 실행을 수반하는 도구 기반 적응형 계획인 TAPE(Tool-guided Adaptive Planning with constrained Execution)를 제안한다. TAPE는 여러 계획을 하나의 그래프로 통합하고 외부 솔버(solver)를 사용하여 실행 가능한 경로를 식별함으로써 계획 능력을 향상시킨다. 실행 단계에서 TAPE는 샘플링 노이즈를 줄이기 위해 제약적 디코딩을 사용하며, 환경 피드백이 의도한 상태에서 벗어날 때마다 적응적으로 계획을 재수립한다. Sokoban, ALFWorld, MuSiQue 및 GSM8K-Hard에 걸친 실험 결과에 따르면, TAPE는 기존 프레임워크를 일관되게 능가하며, 특히 어려운 설정에서 평균 21.0% 포인트, 상대적으로 약한 기본 모델에서 평균 20.0% 포인트의 성공률을 향상시켜 큰 폭의 성능 개선을 보여주었다. 코드와 데이터는 여기에서 이용할 수 있다.

Original Abstract

Language Model (LM) agents have demonstrated remarkable capabilities in solving tasks that require multiple interactions with the environment. However, they remain vulnerable in environments where a single error often leads to irrecoverable failure, particularly under strict feasibility constraints. We systematically analyze existing agent frameworks, identifying imperfect planning and stochastic execution as the primary causes. To address these challenges, we propose Tool-guided Adaptive Planning with constrained Execution (TAPE). TAPE enhances planning capability by aggregating multiple plans into a graph and employing an external solver to identify a feasible path. During execution, TAPE employs constrained decoding to reduce sampling noise, while adaptively re-planning whenever environmental feedback deviates from the intended state. Experiments across Sokoban, ALFWorld, MuSiQue, and GSM8K-Hard demonstrate that TAPE consistently outperforms existing frameworks, with particularly large gains on hard settings, improving success rates by 21.0 percentage points on hard settings on average, and by 20.0 percentage points for weaker base models on average. Code and data available at here.

1 Citations

0 Influential

6.5 Altmetric

33.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!