2601.06860v2 Jan 11, 2026 cs.AI

ET-Agent: 행동 교정을 통한 효과적인 도구 통합 추론 에이전트 유도

ET-Agent: Incentivizing Effective Tool-Integrated Reasoning Agent via Behavior Calibration

Guanting Dong

Citations: 1,152

h-index: 13

Zhicheng Dou

Citations: 2,389

h-index: 24

Yifei Chen

Citations: 211

h-index: 4

거대 언어 모델(LLM)은 도구 통합 추론(TIR) 패러다임을 채택함으로써 매개변수 지식의 한계를 확장할 수 있습니다. 그러나 기존의 LLM 기반 에이전트 훈련 프레임워크는 종종 답변의 정확성에만 초점을 맞추며, 행동 패턴에 대한 구체적인 정렬은 간과합니다. 그 결과, 에이전트는 TIR 작업을 수행하는 동안 중복되거나 불충분한 도구 호출과 같은 비효율적인 행동을 자주 보입니다. TIR 작업을 실행할 때 잘못된 행동 패턴을 교정하여 효과적인 경로를 탐색하는 방법은 여전히 미해결 문제로 남아 있습니다. 본 논문에서는 두 가지 상호 시너지 효과를 내는 관점인 '자기 진화 데이터 플라이휠(Self-evolving Data Flywheel)'과 '행동 교정 훈련(Behavior Calibration Training)'을 통해 에이전트의 도구 사용 행동을 교정하는 훈련 프레임워크인 ET-Agent를 제안합니다. 구체적으로, 우리는 향상된 데이터를 생성하기 위해 자기 진화 데이터 플라이휠을 도입하고, 이를 사용하여 LLM을 미세 조정(fine-tune)함으로써 탐색 능력을 향상시킵니다. 이를 바탕으로 우리는 2단계 행동 교정 훈련 프레임워크를 구현합니다. 이는 잘못된 행동 패턴을 점진적으로 최적의 행동으로 교정하도록 설계되었습니다. 추가적인 심층 실험을 통해 정확성, 효율성, 추론 간결성, 도구 실행 정확도 등 다양한 차원에서 ET-Agent의 우수성을 확인했습니다. 우리의 ET-Agent 프레임워크는 TIR 분야 연구에 실질적인 통찰력을 제공합니다. 코드는 https://github.com/asilverlight/ET-Agent 에서 확인할 수 있습니다.

Original Abstract

Large Language Models (LLMs) can extend their parameter knowledge limits by adopting the Tool-Integrated Reasoning (TIR) paradigm. However, existing LLM-based agent training framework often focuses on answers' accuracy, overlooking specific alignment for behavior patterns. Consequently, agent often exhibits ineffective actions during TIR tasks, such as redundant and insufficient tool calls. How to calibrate erroneous behavioral patterns when executing TIR tasks, thereby exploring effective trajectories, remains an open-ended problem. In this paper, we propose ET-Agent, a training framework for calibrating agent's tool-use behavior through two synergistic perspectives: Self-evolving Data Flywheel and Behavior Calibration Training. Specifically, we introduce a self-evolutionary data flywheel to generate enhanced data, used to fine-tune LLM to improve its exploration ability. Based on this, we implement an two-phases behavior-calibration training framework. It is designed to progressively calibrate erroneous behavioral patterns to optimal behaviors. Further in-depth experiments confirm the superiority of \ourmodel{} across multiple dimensions, including correctness, efficiency, reasoning conciseness, and tool execution accuracy. Our ET-Agent framework provides practical insights for research in the TIR field. Codes can be found in https://github.com/asilverlight/ET-Agent

2 Citations

0 Influential

46.97866136777 Altmetric

236.9 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!