2602.03219v1 Feb 03, 2026 cs.AI

양(Quantity)을 넘어서: 코드 에이전트를 위한 궤적 다양성 스케일링

Beyond Quantity: Trajectory Diversity Scaling for Code Agents

Bing Zhao

Citations: 61

h-index: 4

Guhong Chen

Citations: 74

h-index: 3

Feiteng Fang

Citations: 171

h-index: 6

A. Argha

Citations: 1,026

h-index: 18

Xander Xu

Citations: 15

h-index: 2

H. Alinejad-Rokny

Citations: 185

h-index: 8

Qiang Qu

Citations: 96

h-index: 6

Binhua Li

Citations: 2,254

h-index: 20

Shiwen Ni

Citations: 642

h-index: 11

Min Yang

Citations: 53

h-index: 4

Yongbin Li

Citations: 2,230

h-index: 20

Qi Han

Citations: 554

h-index: 6

Hu Wei

Citations: 26

h-index: 3

Qiyao Wang

Citations: 207

h-index: 3

Chen Sun

Citations: 189

h-index: 6

Cheng Fu

Citations: 172

h-index: 4

Zhihong Huang

Citations: 6

h-index: 2

Guangxu Chen

Citations: 19

h-index: 3

Chaopeng Wei

Citations: 1

h-index: 1

코드 대규모 언어 모델(LLM)이 MCP(Model Context Protocol)를 통해 도구와 상호작용하는 에이전트로 진화함에 따라, 저품질 합성 데이터와 양적 스케일링의 수확 체감으로 인해 모델의 일반화 능력이 점점 제한되고 있다. 게다가 양 중심의 스케일링은 궤적 데이터를 충분히 활용하지 못하는 초기 병목 현상을 보인다. 우리는 단순한 데이터의 양보다는 다양성을 통해 성능을 확장하는 코드 에이전트용 데이터 합성 프레임워크인 '궤적 다양성 스케일링(Trajectory Diversity Scaling)' 기반의 TDScaling을 제안한다. 고정된 훈련 예산 내에서 궤적의 다양성을 높이는 것이 단순히 궤적을 추가하는 것보다 더 큰 성능 향상을 가져오며, 에이전트 훈련의 성능 대비 비용 효율을 개선한다. TDScaling은 네 가지 혁신 기술을 통합한다. (1) 실제 서비스의 논리적 의존성을 포착하는 비즈니스 클러스터 메커니즘, (2) 궤적의 일관성을 강화하는 청사진 주도 다중 에이전트 패러다임, (3) 모드 붕괴를 방지하기 위해 도메인 엔트로피, 추론 모드 엔트로피, 누적 행동 복잡도를 활용하여 합성을 롱테일 시나리오로 유도하는 적응형 진화 메커니즘, (4) 고유 코딩 능력의 치명적 망각을 완화하는 샌드박스형 코드 도구가 그것이다. 일반적인 도구 사용 벤치마크(BFCL, tau^2-Bench)와 코드 에이전트 작업(RebenchT, CodeCI, BIRD)에 대한 실험 결과, TDScaling은 도구 사용 일반화 능력과 내재된 코딩 숙련도를 모두 향상시키는 윈-윈(win-win) 성과를 입증했다. 우리는 논문 출판과 함께 전체 코드베이스와 합성된 데이터셋(30,000개 이상의 도구 클러스터 포함)을 공개할 계획이다.

Original Abstract

As code large language models (LLMs) evolve into tool-interactive agents via the Model Context Protocol (MCP), their generalization is increasingly limited by low-quality synthetic data and the diminishing returns of quantity scaling. Moreover, quantity-centric scaling exhibits an early bottleneck that underutilizes trajectory data. We propose TDScaling, a Trajectory Diversity Scaling-based data synthesis framework for code agents that scales performance through diversity rather than raw volume. Under a fixed training budget, increasing trajectory diversity yields larger gains than adding more trajectories, improving the performance-cost trade-off for agent training. TDScaling integrates four innovations: (1) a Business Cluster mechanism that captures real-service logical dependencies; (2) a blueprint-driven multi-agent paradigm that enforces trajectory coherence; (3) an adaptive evolution mechanism that steers synthesis toward long-tail scenarios using Domain Entropy, Reasoning Mode Entropy, and Cumulative Action Complexity to prevent mode collapse; and (4) a sandboxed code tool that mitigates catastrophic forgetting of intrinsic coding capabilities. Experiments on general tool-use benchmarks (BFCL, tau^2-Bench) and code agent tasks (RebenchT, CodeCI, BIRD) demonstrate a win-win outcome: TDScaling improves both tool-use generalization and inherent coding proficiency. We plan to release the full codebase and the synthesized dataset (including 30,000+ tool clusters) upon publication.

0 Citations

0 Influential

10 Altmetric

50.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!