2602.02050v2 Feb 02, 2026 cs.AI

대규모 언어 모델 에이전트의 도구 사용 행동 최적화를 위한 엔트로피의 역할 재고

Rethinking the Role of Entropy in Optimizing Tool-Use Behaviors for Large Language Model Agents

Yixia Li

Southern University of Science and Technology

Citations: 141

h-index: 6

Guanhua Chen

Citations: 49

h-index: 1

Yiwen Zhao

Citations: 0

h-index: 0

Guangnan Ye

Citations: 88

h-index: 5

Hongfeng Chai

Citations: 98

h-index: 5

Zeping Li

Citations: 15

h-index: 2

Keyang Chen

Citations: 35

h-index: 3

Yixin Cao

Citations: 5

h-index: 1

Zhenfei Yin

Citations: 142

h-index: 3

Hongru Wang

Citations: 379

h-index: 6

대규모 언어 모델(LLM) 기반 에이전트는 수학적 추론 및 다중 단계 질의 응답과 같은 작업에서 뛰어난 성능을 보입니다. 그러나 긴 실행 과정에서 에이전트는 종종 과도하고 품질이 낮은 도구 호출을 유발하여 지연 시간을 증가시키고 추론 성능을 저하시키므로 도구 사용 행동을 관리하기 어렵습니다. 본 연구에서는 엔트로피 기반의 초기 실험을 수행하고, 엔트로피 감소와 고품질 도구 호출 간의 강한 양의 상관 관계를 확인했습니다. 이 결과를 바탕으로, 엔트로피 감소를 지도 신호로 활용하고, 도구 사용 행동을 최적화하기 위한 다양한 요구사항을 해결하기 위한 두 가지 보상 전략을 설계했습니다. 희소한 결과 보상은 전체 경로 수준에서 효율성을 향상시키기 위한 개략적인 지침을 제공하며, 밀집된 과정 보상은 성능 향상을 위한 세밀한 지도를 제공합니다. 다양한 영역에서의 실험 결과, 두 가지 보상 전략 모두 도구 사용 행동을 개선하는 것으로 나타났습니다. 첫 번째 전략은 기준 모델의 평균 대비 도구 호출 횟수를 72.07% 감소시켰으며, 두 번째 전략은 성능을 22.27% 향상시켰습니다. 이러한 결과는 엔트로피 감소가 도구 사용 행동을 향상시키는 핵심 메커니즘임을 보여주며, 에이전트가 실제 응용 분야에서 더 적응적으로 작동할 수 있도록 합니다.

Original Abstract

Tool-using agents based on Large Language Models (LLMs) excel in tasks such as mathematical reasoning and multi-hop question answering. However, in long trajectories, agents often trigger excessive and low-quality tool calls, increasing latency and degrading inference performance, making managing tool-use behavior challenging. In this work, we conduct entropy-based pilot experiments and observe a strong positive correlation between entropy reduction and high-quality tool calls. Building on this finding, we propose using entropy reduction as a supervisory signal and design two reward strategies to address the differing needs of optimizing tool-use behavior. Sparse outcome rewards provide coarse, trajectory-level guidance to improve efficiency, while dense process rewards offer fine-grained supervision to enhance performance. Experiments across diverse domains show that both reward designs improve tool-use behavior: the former reduces tool calls by 72.07% compared to the average of baselines, while the latter improves performance by 22.27%. These results position entropy reduction as a key mechanism for enhancing tool-use behavior, enabling agents to be more adaptive in real-world applications.

0 Citations

0 Influential

3 Altmetric

15.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!