2602.04284v1 Feb 04, 2026 cs.AI

Agent-Omit: 에이전트 강화 학습을 통한 적응형 사고 및 관찰 생략을 위한 효율적인 LLM 에이전트 훈련

Agent-Omit: Training Efficient LLM Agents for Adaptive Thought and Observation Omission via Agentic Reinforcement Learning

Yansong NING

The Hong Kong University of Science and Technology (Guangzhou)

Citations: 100

h-index: 6

Hao Liu

Citations: 42

h-index: 4

Naiqiang Tan

Citations: 141

h-index: 5

Jun Fang

Citations: 84

h-index: 4

다중 턴 에이전트-환경 상호작용 중에 에이전트의 사고와 관찰을 관리하는 것은 에이전트의 효율성을 향상시키기 위한 새로운 전략으로 부상하고 있습니다. 그러나 기존 연구들은 전체 상호작용 궤적을 동일하게 취급하여, 턴마다 사고의 필요성과 관찰의 효용성이 다르다는 점을 간과하고 있습니다. 이를 위해, 우리는 먼저 사고와 관찰이 에이전트의 효과성과 효율성에 미치는 영향에 대한 정량적 조사를 수행합니다. 이러한 발견을 바탕으로, 우리는 LLM 에이전트가 불필요한 사고와 관찰을 적응적으로 생략할 수 있도록 하는 통합 훈련 프레임워크인 Agent-Omit을 제안합니다. 구체적으로, 우리는 먼저 단일 턴 및 다중 턴 생략 시나리오를 포함한 소량의 콜드 스타트 데이터를 합성하여 에이전트가 생략 행동을 익히도록 미세 조정합니다. 더 나아가, 이중 샘플링 메커니즘과 맞춤형 생략 보상을 통합한 생략 인식(omit-aware) 에이전트 강화 학습 접근 방식을 도입하여 에이전트의 적응형 생략 능력을 강화합니다. 이론적으로, 우리는 생략 정책의 편차가 KL-발산(KL-divergence)에 의해 상한이 정해짐을 증명합니다. 5가지 에이전트 벤치마크에 대한 실험 결과, 우리가 구축한 Agent-Omit-8B는 7가지 최첨단 LLM 에이전트와 대등한 성능을 보였으며, 7가지 효율적 LLM 에이전트 방법론들 중 가장 뛰어난 효과성-효율성 트레이드오프를 달성한 것으로 나타났습니다. 코드와 데이터는 https://github.com/usail-hkust/Agent-Omit 에서 확인할 수 있습니다.

Original Abstract

Managing agent thought and observation during multi-turn agent-environment interactions is an emerging strategy to improve agent efficiency. However, existing studies treat the entire interaction trajectories equally, overlooking the thought necessity and observation utility varies across turns. To this end, we first conduct quantitative investigations into how thought and observation affect agent effectiveness and efficiency. Based on our findings, we propose Agent-Omit, a unified training framework that empowers LLM agents to adaptively omit redundant thoughts and observations. Specifically, we first synthesize a small amount of cold-start data, including both single-turn and multi-turn omission scenarios, to fine-tune the agent for omission behaviors. Furthermore, we introduce an omit-aware agentic reinforcement learning approach, incorporating a dual sampling mechanism and a tailored omission reward to incentivize the agent's adaptive omission capability. Theoretically, we prove that the deviation of our omission policy is upper-bounded by KL-divergence. Experimental results on five agent benchmarks show that our constructed Agent-Omit-8B could obtain performance comparable to seven frontier LLM agent, and achieve the best effectiveness-efficiency trade-off than seven efficient LLM agents methods. Our code and data are available at https://github.com/usail-hkust/Agent-Omit.

0 Citations

0 Influential

33.986122886681 Altmetric

169.9 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!