2602.11351v1 Feb 11, 2026 cs.AI

행동 주도 최적화를 통한 능동형 에이전트의 패레토 최적 전선 확장

Pushing Forward Pareto Frontiers of Proactive Agents with Behavioral Agentic Optimization

Haohong Lin

Citations: 213

h-index: 7

Yi-Fan Yao

Citations: 272

h-index: 9

Zhepeng Cen

Citations: 533

h-index: 11

Shiqi Liu

Citations: 114

h-index: 4

Zuxin Liu

Carnegie Mellon University

Citations: 2,111

h-index: 20

Jiacheng Zhu

Citations: 457

h-index: 13

Zhang-Wei Hong

Citations: 43

h-index: 4

Laixi Shi

Citations: 58

h-index: 3

Ding Zhao

Citations: 194

h-index: 7

능동적인 대규모 언어 모델(LLM) 에이전트는 능동적으로 계획하고, 질문하고, 여러 단계에 걸쳐 상호 작용하여 효율적인 작업 완료를 가능하게 하며, 수동적인 지시 따르기를 넘어 실제 사용자 중심 애플리케이션에 필수적입니다. 최근, 에이전트 강화 학습(RL)은 이러한 에이전트를 다단계 환경에서 훈련하는 유망한 솔루션으로 등장했으며, 이를 통해 피드백으로부터 상호 작용 전략을 학습할 수 있습니다. 그러나 기존 파이프라인은 작업 성능과 사용자 참여 사이의 균형을 맞추는 데 중요한 어려움을 겪습니다. 수동적인 에이전트는 사용자의 의도에 효율적으로 적응하기 어렵고, 인간 피드백의 과도한 사용은 사용자 만족도를 저하시킵니다. 이러한 상충 관계를 해결하기 위해, 우리는 행동 향상을 통해 능동적인 추론 및 정보 수집 능력을 강화하고, 비효율적이거나 중복되는 상호 작용을 억제하고 에이전트의 행동을 사용자 기대에 맞추는 행동 규제 기능을 결합한 에이전트 RL 프레임워크인 BAO를 제안합니다. 우리는 BAO를 UserRL 벤치마크 스위트의 다양한 작업에 대해 평가하고, BAO가 기존의 능동형 에이전트 RL 기준을 크게 능가하며, 상업용 LLM 에이전트와 유사하거나 더 나은 성능을 달성한다는 것을 보여줍니다. 이는 복잡한 다단계 시나리오에서 능동적이고 사용자 중심적인 LLM 에이전트를 훈련하는 데 효과적임을 강조합니다. 당사 웹사이트: https://proactive-agentic-rl.github.io/.

Original Abstract

Proactive large language model (LLM) agents aim to actively plan, query, and interact over multiple turns, enabling efficient task completion beyond passive instruction following and making them essential for real-world, user-centric applications. Agentic reinforcement learning (RL) has recently emerged as a promising solution for training such agents in multi-turn settings, allowing interaction strategies to be learned from feedback. However, existing pipelines face a critical challenge in balancing task performance with user engagement, as passive agents can not efficiently adapt to users' intentions while overuse of human feedback reduces their satisfaction. To address this trade-off, we propose BAO, an agentic RL framework that combines behavior enhancement to enrich proactive reasoning and information-gathering capabilities with behavior regularization to suppress inefficient or redundant interactions and align agent behavior with user expectations. We evaluate BAO on multiple tasks from the UserRL benchmark suite, and demonstrate that it substantially outperforms proactive agentic RL baselines while achieving comparable or even superior performance to commercial LLM agents, highlighting its effectiveness for training proactive, user-aligned LLM agents in complex multi-turn scenarios. Our website: https://proactive-agentic-rl.github.io/.

0 Citations

0 Influential

10 Altmetric

50.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!