2601.09382v1 Jan 14, 2026 cs.AI

장기 과업 지향 에이전트: 동적 환경에서의 주도적인 장기 의도 유지

Long-term Task-oriented Agent: Proactive Long-term Intent Maintenance in Dynamic Environments

Jiguo Li

Citations: 3

h-index: 1

Jun Xu

Citations: 12

h-index: 2

Jiuchong Gao

Citations: 14

h-index: 2

Jinghua Hao

Citations: 50

h-index: 2

Renqing He

Citations: 338

h-index: 9

Donghai Wang

Citations: 4

h-index: 1

Qi Shi

Citations: 2

h-index: 1

Hantao Zhou

Citations: 186

h-index: 5

현재의 대규모 언어 모델 에이전트들은 주로 단기 세션 내에서 즉각적인 사용자 질의에만 응답하는 반응형 패러다임 하에서 작동합니다. 이러한 한계는 사용자의 장기적인 의도를 유지하고 변화하는 외부 환경에 동적으로 적응하는 능력을 저해합니다. 본 논문에서는 비교적 정적인 사용자의 요구와 동적인 환경 사이의 간극을 메울 수 있는 주도적인 과업 지향 에이전트를 위한 새로운 상호작용 패러다임을 제안합니다. 우리는 주도성을 두 가지 핵심 기능을 통해 정형화합니다. (i) 의도 조건부 모니터링(Intent-Conditioned Monitoring): 에이전트가 대화 기록을 바탕으로 트리거 조건을 자율적으로 수립합니다. (ii) 이벤트 기반 후속 조치(Event-Triggered Follow-up): 유용한 환경 업데이트를 감지하면 에이전트가 사용자에게 능동적으로 상호작용을 시도합니다. 우리는 동적 환경에서 복잡한 다중 턴 대화 데이터를 구축하기 위해 고품질 데이터 합성 파이프라인을 소개합니다. 또한, 새로운 벤치마크인 ChronosBench를 제안함으로써 동적 환경에서의 과업 지향 상호작용에 대한 평가 기준 부재 문제를 해결하고자 합니다. 우리는 현재 선도적인 일부 폐쇄형 및 오픈 소스 모델들을 평가하여 장기 과업 지향 상호작용에서의 결함을 밝혀냈습니다. 더 나아가, 합성 데이터를 사용하여 지도 학습으로 훈련된 우리의 파인 튜닝 모델은 사용자 의도 변화를 포함한 복잡한 작업에 대해 85.19%의 작업 완료율을 달성하여, 테스트된 다른 모델들보다 뛰어난 성능을 보였습니다. 그리고 이러한 결과는 우리의 데이터 기반 전략의 유효성을 입증했습니다.

Original Abstract

Current large language model agents predominantly operate under a reactive paradigm, responding only to immediate user queries within short-term sessions. This limitation hinders their ability to maintain long-term user's intents and dynamically adapt to evolving external environments. In this paper, we propose a novel interaction paradigm for proactive Task-oriented Agents capable of bridging the gap between relatively static user's needs and a dynamic environment. We formalize proactivity through two key capabilities, (i) Intent-Conditioned Monitoring: The agent autonomously formulates trigger conditions based on dialog history; (ii) Event-Triggered Follow-up: The agent actively engages the user upon detecting useful environmental updates. We introduce a high-quality data synthesis pipeline to construct complex, multi-turn dialog data in a dynamic environment. Furthermore, we attempt to address the lack of evaluation criteria of task-oriented interaction in a dynamic environment by proposing a new benchmark, namely ChronosBench. We evaluated some leading close-source and open-source models at present and revealed their flaws in long-term task-oriented interaction. Furthermore, our fine-tuned model trained using synthetic data for supervised learning achieves a task completion rate of 85.19% for complex tasks including shifts in user intent, outperforming other models under test. And the result validated the effectiveness of our data-driven strategy.

1 Citations

1 Influential

4.5 Altmetric

25.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!