2603.07915v1 Mar 09, 2026 cs.AI

Ares: 효율적인 LLM 에이전트를 위한 적응적 추론 노력 선택

Ares: Adaptive Reasoning Effort Selection for Efficient LLM Agents

Yujia Bao

Citations: 426

h-index: 8

Jingbo Yang

Citations: 47

h-index: 2

Bairu Hou

Citations: 1,044

h-index: 13

Wei Wei

Citations: 354

h-index: 7

Shiyu Chang

Citations: 47

h-index: 2

최근의 사고 능력을 갖춘 LLM 기반 에이전트는 긴 추론 과정을 통해 높은 정확도를 달성하지만, 상당한 추론 비용이 발생합니다. 많은 LLM이 이제 다양한 추론 수준(예: 높음/중간/낮음)을 지원하지만, 정적인 전략은 종종 효과적이지 않습니다. 모든 단계에서 낮은 노력을 사용하는 경우 성능 저하가 심각하며, 무작위 선택은 정확도를 유지하거나 의미 있는 비용 절감을 제공하지 못합니다. 그러나 에이전트는 복잡한 웹사이트 구조 탐색과 같은 어려운 단계에서는 높은 추론 노력을 사용해야 하며, 간단한 단계(예: 대상 URL 열기)에서는 낮은 노력을 사용하는 것이 좋습니다. 본 논문에서는 다단계 에이전트 작업에 적합한 단계별 동적 추론 노력 선택을 위한 프레임워크인 Ares를 제안합니다. Ares는 경량 라우터를 사용하여 상호 작용 기록을 기반으로 각 단계에 가장 적합한 추론 수준을 예측합니다. 이 라우터를 학습하기 위해, 우리는 성공적인 단계 완료에 필요한 최소 추론 노력을 식별하는 데이터 생성 파이프라인을 개발했습니다. 그런 다음, 이 라우터를 이러한 수준을 예측하도록 미세 조정하여 모든 LLM 에이전트에 쉽게 통합할 수 있도록 합니다. 우리는 다양한 에이전트 작업(TAU-Bench: 도구 사용 에이전트, BrowseComp-Plus: 심층 연구 에이전트, WebArena: 웹 에이전트)에 대해 Ares를 평가했습니다. 실험 결과는 Ares가 고정된 높은 노력을 사용하는 경우에 비해 추론 토큰 사용량을 최대 52.7%까지 줄이는 동시에 작업 성공률 저하를 최소화한다는 것을 보여줍니다.

Original Abstract

Modern agents powered by thinking LLMs achieve high accuracy through long chain-of-thought reasoning but incur substantial inference costs. While many LLMs now support configurable reasoning levels (e.g., high/medium/low), static strategies are often ineffective: using low-effort modes at every step leads to significant performance degradation, while random selection fails to preserve accuracy or provide meaningful cost reduction. However, agents should reserve high reasoning effort for difficult steps like navigating complex website structures, while using lower-effort modes for simpler steps like opening a target URL. In this paper, we propose Ares, a framework for per-step dynamic reasoning effort selection tailored for multi-step agent tasks. Ares employs a lightweight router to predict the lowest appropriate reasoning level for each step based on the interaction history. To train this router, we develop a data generation pipeline that identifies the minimum reasoning effort required for successful step completion. We then fine-tune the router to predict these levels, enabling plug-and-play integration for any LLM agents. We evaluate Ares on a diverse set of agent tasks, including TAU-Bench for tool use agents, BrowseComp-Plus for deep-research agents, and WebArena for web agents. Experimental results show that Ares reduces reasoning token usage by up to 52.7% compared to fixed high-effort reasoning, while introducing minimal degradation in task success rates.

1 Citations

0 Influential

6.5 Altmetric

33.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!