2602.03219v2 Feb 03, 2026 cs.AI

수량의 한계를 넘어: 코드 에이전트를 위한 경로 다양성 확장

Beyond Quantity: Trajectory Diversity Scaling for Code Agents

Bing Zhao

Citations: 61

h-index: 4

Guhong Chen

Citations: 74

h-index: 3

Feiteng Fang

Citations: 171

h-index: 6

A. Argha

Citations: 1,026

h-index: 18

Xander Xu

Citations: 15

h-index: 2

H. Alinejad-Rokny

Citations: 185

h-index: 8

Qiang Qu

Citations: 96

h-index: 6

Binhua Li

Citations: 2,254

h-index: 20

Shiwen Ni

Citations: 642

h-index: 11

Min Yang

Citations: 53

h-index: 4

Yongbin Li

Citations: 2,230

h-index: 20

Qi Han

Citations: 554

h-index: 6

Hu Wei

Citations: 26

h-index: 3

Qiyao Wang

Citations: 207

h-index: 3

Chen Sun

Citations: 189

h-index: 6

Cheng Fu

Citations: 172

h-index: 4

Zhihong Huang

Citations: 6

h-index: 2

Guangxu Chen

Citations: 19

h-index: 3

Chaopeng Wei

Citations: 1

h-index: 1

코드 대규모 언어 모델(LLM)이 모델 컨텍스트 프로토콜(MCP)을 통해 도구와 상호작용하는 에이전트로 진화함에 따라, 성능 향상은 저품질의 합성 데이터와 수량 확장으로 인한 점진적인 효과 감소에 의해 점점 더 제한되고 있습니다. 또한, 수량 중심의 확장은 초기 단계에서 병목 현상을 발생시켜 경로 데이터를 충분히 활용하지 못합니다. 본 연구에서는 코드 에이전트를 위한 경로 다양성 확장(TDScaling)이라는 데이터 합성 프레임워크를 제안합니다. TDScaling은 원시 데이터의 양이 아닌 다양성을 통해 성능을 향상시킵니다. 제한된 학습 예산 하에서, 더 많은 경로 데이터를 추가하는 것보다 경로 다양성을 증가시키는 것이 더 큰 성능 향상을 가져오며, 에이전트 학습의 성능-비용 균형을 개선합니다. TDScaling은 다음과 같은 네 가지 혁신을 통합합니다: (1) 실제 서비스의 논리적 의존성을 포착하는 비즈니스 클러스터 메커니즘; (2) 경로의 일관성을 유지하는 청사진 기반의 멀티 에이전트 패러다임; (3) 도메인 엔트로피, 추론 모드 엔트로피 및 누적 액션 복잡성을 사용하여 장기적인 시나리오로 합성 방향을 조정하는 적응적 진화 메커니즘(모드 붕괴 방지); 및 (4) 내재된 코딩 능력을 손실하는 것을 완화하는 샌드박스 환경의 코드 도구. 일반적인 도구 사용 벤치마크(BFCL, tau^2-Bench) 및 코드 에이전트 작업(RebenchT, CodeCI, BIRD)에 대한 실험 결과는 TDScaling이 도구 사용의 일반화 능력과 내재된 코딩 능력을 모두 향상시킨다는 것을 보여줍니다. 출판 시 전체 코드베이스 및 합성 데이터셋(30,000개 이상의 도구 클러스터 포함)을 공개할 예정입니다.

Original Abstract

As code large language models (LLMs) evolve into tool-interactive agents via the Model Context Protocol (MCP), their generalization is increasingly limited by low-quality synthetic data and the diminishing returns of quantity scaling. Moreover, quantity-centric scaling exhibits an early bottleneck that underutilizes trajectory data. We propose TDScaling, a Trajectory Diversity Scaling-based data synthesis framework for code agents that scales performance through diversity rather than raw volume. Under a fixed training budget, increasing trajectory diversity yields larger gains than adding more trajectories, improving the performance-cost trade-off for agent training. TDScaling integrates four innovations: (1) a Business Cluster mechanism that captures real-service logical dependencies; (2) a blueprint-driven multi-agent paradigm that enforces trajectory coherence; (3) an adaptive evolution mechanism that steers synthesis toward long-tail scenarios using Domain Entropy, Reasoning Mode Entropy, and Cumulative Action Complexity to prevent mode collapse; and (4) a sandboxed code tool that mitigates catastrophic forgetting of intrinsic coding capabilities. Experiments on general tool-use benchmarks (BFCL, tau^2-Bench) and code agent tasks (RebenchT, CodeCI, BIRD) demonstrate a win-win outcome: TDScaling improves both tool-use generalization and inherent coding proficiency. We plan to release the full codebase and the synthesized dataset (including 30,000+ tool clusters) upon publication.

0 Citations

0 Influential

10 Altmetric

50.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!