2602.06485v1 Feb 06, 2026 cs.AI

AgentCPM-Explore: 엣지 스케일 에이전트를 위한 장기 심층 탐색 실현

AgentCPM-Explore: Realizing Long-Horizon Deep Exploration for Edge-Scale Agents

Ya-Ting Lu

Citations: 2,696

h-index: 10

Haotian Chen

Citations: 93

h-index: 4

Yankai Lin

Citations: 189

h-index: 7

Yishan Li

Citations: 5

h-index: 2

Maosong Sun

Citations: 195

h-index: 6

Zhong Zhang

Citations: 35

h-index: 4

Yukun Yan

Citations: 663

h-index: 11

X. Cong

Citations: 3,566

h-index: 16

Zhiyuan Liu

Citations: 594

h-index: 9

Shengda Fan

Citations: 57

h-index: 2

Ziqi Gong

Citations: 5

h-index: 2

Bo Niu

Citations: 9

h-index: 2

Zijun Song

Citations: 53

h-index: 2

Huadong Wang

Citations: 12

h-index: 2

Yesai Wu

Citations: 302

h-index: 9

Yue Wu

Citations: 105

h-index: 5

Zihao Xie

Citations: 299

h-index: 5

Yu Fu

Citations: 171

h-index: 4

Chengjun Pan

Citations: 10

h-index: 2

대규모 언어 모델(LLM) 기반 에이전트가 복잡한 작업을 해결하는 데 있어 놀라운 잠재력을 보여주었지만, 기존 시스템은 여전히 대규모 모델에 크게 의존하고 있어 엣지 스케일 모델의 기능은 대체로 충분히 탐구되지 않은 상태입니다. 본 논문에서는 40억(4B) 파라미터 규모에서 에이전트 모델을 학습시키는 것에 대한 최초의 체계적인 연구를 제시합니다. 우리는 엣지 스케일 모델의 성능을 저해하는 세 가지 주요 병목 현상으로 지도 미세 조정(SFT) 중 발생하는 파국적 망각, 강화 학습(RL) 중 보상 신호 잡음에 대한 민감성, 그리고 긴 문맥 시나리오에서 불필요한 정보로 인한 추론 능력 저하를 파악했습니다. 이러한 문제를 해결하기 위해, 우리는 높은 지식 밀도와 강력한 탐색 능력을 갖춘 소형 4B 에이전트 모델인 AgentCPM-Explore를 제안합니다. 우리는 파라미터 공간 모델 융합, 보상 신호 잡음 제거, 문맥 정보 정제를 특징으로 하는 포괄적인 학습 프레임워크를 소개합니다. 심층 탐색을 통해 AgentCPM-Explore는 4B급 모델 중 최고 성능(SOTA)을 달성했으며, 4개의 벤치마크에서 8B급 SOTA 모델과 대등하거나 이를 능가하고, 5개의 벤치마크에서는 Claude-4.5-Sonnet이나 DeepSeek-v3.2와 같은 더 큰 규모의 모델보다 뛰어난 성능을 보였습니다. 특히, AgentCPM-Explore는 pass@64 기준 GAIA 텍스트 기반 작업에서 97.09%의 정확도를 달성했습니다. 이러한 결과는 엣지 스케일 모델의 병목 현상이 내재된 능력의 한계가 아니라 추론 안정성에 있다는 강력한 증거를 제공합니다. 우리의 탄탄한 학습 프레임워크를 기반으로, AgentCPM-Explore는 이전에는 과소평가되었던 엣지 스케일 모델의 상당한 잠재력을 효과적으로 이끌어냅니다.

Original Abstract

While Large Language Model (LLM)-based agents have shown remarkable potential for solving complex tasks, existing systems remain heavily reliant on large-scale models, leaving the capabilities of edge-scale models largely underexplored. In this paper, we present the first systematic study on training agentic models at the 4B-parameter scale. We identify three primary bottlenecks hindering the performance of edge-scale models: catastrophic forgetting during Supervised Fine-Tuning (SFT), sensitivity to reward signal noise during Reinforcement Learning (RL), and reasoning degradation caused by redundant information in long-context scenarios. To address the issues, we propose AgentCPM-Explore, a compact 4B agent model with high knowledge density and strong exploration capability. We introduce a holistic training framework featuring parameter-space model fusion, reward signal denoising, and contextual information refinement. Through deep exploration, AgentCPM-Explore achieves state-of-the-art (SOTA) performance among 4B-class models, matches or surpasses 8B-class SOTA models on four benchmarks, and even outperforms larger-scale models such as Claude-4.5-Sonnet or DeepSeek-v3.2 in five benchmarks. Notably, AgentCPM-Explore achieves 97.09% accuracy on GAIA text-based tasks under pass@64. These results provide compelling evidence that the bottleneck for edge-scale models is not their inherent capability ceiling, but rather their inference stability. Based on our well-established training framework, AgentCPM-Explore effectively unlocks the significant, yet previously underestimated, potential of edge-scale models.

3 Citations

1 Influential

8 Altmetric

45.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!