2601.01569v1 Jan 04, 2026 cs.AI

CaveAgent: LLM을 상태 유지 런타임 오퍼레이터로 전환하기

CaveAgent: Transforming LLMs into Stateful Runtime Operators

Maohao Ran

Citations: 101

h-index: 3

Zhenglin Wan

Citations: 25

h-index: 3

Cooper Lin

Citations: 3

h-index: 1

Yanting Zhang

Citations: 6

h-index: 2

Hongwei Fan

Citations: 53

h-index: 4

Yibo Xu

Citations: 63

h-index: 4

Lang Feng

Citations: 729

h-index: 11

Fuchao Yang

Citations: 23

h-index: 3

Jingxuan Wu

Citations: 10

h-index: 3

Yiqiao Huang

Citations: 7

h-index: 2

Chendong Ma

Citations: 85

h-index: 4

Dailing Jiang

Citations: 3

h-index: 1

Sihui Han

Citations: 8

h-index: 2

Bo An

Citations: 11

h-index: 3

Yike Guo

Citations: 13

h-index: 3

Jun Song

Citations: 5

h-index: 2

Hongyu Xin

Citations: 32

h-index: 2

Beier Luo

Citations: 17

h-index: 2

Yaxin Zhou

Citations: 84

h-index: 4

Wangbo Zhao

Citations: 341

h-index: 9

Lijie Yang

Princeton University

Citations: 456

h-index: 6

Jianbo Deng

Citations: 407

h-index: 9

LLM 기반 에이전트는 점점 더 복잡한 작업을 수행할 수 있게 되었지만, 현재의 에이전트 시스템은 여전히 텍스트 중심의 패러다임에 제약을 받고 있다. 전통적인 접근 방식은 절차적 JSON 기반 함수 호출에 의존하는데, 이는 취약한 멀티 턴 의존성과 컨텍스트 드리프트(context drift)로 인해 장기적인 작업(long-horizon tasks)을 처리하는 데 어려움을 겪는 경우가 많다. 본 논문에서는 패러다임을 "텍스트 생성기로서의 LLM"에서 "런타임 오퍼레이터로서의 LLM"으로 전환하는 프레임워크인 CaveAgent를 제안한다. 우리는 상태 관리를 추론을 위한 경량 의미론적 스트림과 실행을 위한 지속적이고 결정론적인 파이썬 런타임 스트림으로 분리하는 이중 스트림 컨텍스트 아키텍처를 도입한다. 상호 의존적인 하위 작업(예: 루프, 조건문)을 단일 단계에서 효율적으로 해결하기 위해 코드 생성을 활용하는 것 외에도, CaveAgent에 '상태 유지 런타임 관리(Stateful Runtime Management)'를 도입한다. 텍스트에 한정되어 외부 객체의 주입 및 검색을 지원하지 않는 기존 코드 기반 접근 방식과 달리, CaveAgent는 턴(turn) 간에 지속되는 복잡한 파이썬 객체(예: DataFrames, 데이터베이스 연결)를 주입, 조작 및 검색한다. 이러한 지속성 메커니즘은 고충실도 외부 메모리 역할을 하여 컨텍스트 드리프트를 제거하고 치명적 망각(catastrophic forgetting)을 방지하는 동시에 처리된 데이터가 다운스트림 애플리케이션으로 손실 없이 전달되도록 보장한다. 대표적인 최신(SOTA) LLM들을 대상으로 한 Tau^2-bench, BFCL 및 다양한 사례 연구에 대한 포괄적인 평가는 CaveAgent의 우수성을 입증한다. 구체적으로, 우리 프레임워크는 소매(retail) 작업에서 10.5%의 성공률 향상을 달성했으며, 멀티 턴 시나리오에서 총 토큰 소비를 28.4% 감소시켰다. 데이터 집약적 작업의 경우, 직접적인 변수 저장 및 검색을 통해 토큰 소비를 59% 절감함으로써, 기존 JSON 기반 및 코드 기반 에이전트에서 컨텍스트 오버플로 오류를 일으키던 대규모 데이터를 CaveAgent가 처리할 수 있게 한다.

Original Abstract

LLM-based agents are increasingly capable of complex task execution, yet current agentic systems remain constrained by text-centric paradigms. Traditional approaches rely on procedural JSON-based function calling, which often struggles with long-horizon tasks due to fragile multi-turn dependencies and context drift. In this paper, we present CaveAgent, a framework that transforms the paradigm from "LLM-as-Text-Generator" to "LLM-as-Runtime-Operator." We introduce a Dual-stream Context Architecture that decouples state management into a lightweight semantic stream for reasoning and a persistent, deterministic Python Runtime stream for execution. In addition to leveraging code generation to efficiently resolve interdependent sub-tasks (e.g., loops, conditionals) in a single step, we introduce \textit{Stateful Runtime Management} in CaveAgent. Distinct from existing code-based approaches that remain text-bound and lack the support for external object injection and retrieval, CaveAgent injects, manipulates, and retrieves complex Python objects (e.g., DataFrames, database connections) that persist across turns. This persistence mechanism acts as a high-fidelity external memory to eliminate context drift, avoid catastrophic forgetting, while ensuring that processed data flows losslessly to downstream applications. Comprehensive evaluations on Tau$^2$-bench, BFCL and various case studies across representative SOTA LLMs demonstrate CaveAgent's superiority. Specifically, our framework achieves a 10.5\% success rate improvement on retail tasks and reduces total token consumption by 28.4\% in multi-turn scenarios. On data-intensive tasks, direct variable storage and retrieval reduces token consumption by 59\%, allowing CaveAgent to handle large-scale data that causes context overflow failures in both JSON-based and Code-based agents.

3 Citations

0 Influential

5.5 Altmetric

30.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!