2602.06820v1 Feb 06, 2026 cs.AI

ScaleEnv: 범용 상호작용 도구 사용 에이전트 훈련을 위한 기초 단계부터의 환경 합성 확장

ScaleEnv: Scaling Environment Synthesis from Scratch for Generalist Interactive Tool-Use Agent Training

Xunliang Cai

Citations: 74

h-index: 5

Dunwei Tu

Citations: 23

h-index: 2

Hansi Yang

Citations: 24

h-index: 2

Hongyan Hao

Citations: 115

h-index: 5

Yihao Chen

Citations: 30

h-index: 3

Yi-Kai Zhang

Citations: 35

h-index: 3

Shen Furao

Citations: 186

h-index: 8

Hui Su

Citations: 126

h-index: 6

Yueqing Sun

Citations: 32

h-index: 4

Zhi-Wei Xia

Citations: 91

h-index: 4

Yu Yang

Citations: 73

h-index: 3

Xingchen Liu

Citations: 15

h-index: 2

Qi Gu

Citations: 146

h-index: 4

다양한 시나리오에 적응할 수 있는 범용 에이전트를 훈련하려면 자가 탐색을 위한 상호작용 환경이 필수적이다. 그러나 상호작용 환경은 여전히 매우 부족하며, 기존의 합성 방법들은 환경의 다양성과 확장성 측면에서 심각한 한계를 가지고 있다. 이러한 문제를 해결하기 위해, 우리는 완전히 상호작용 가능한 환경과 검증 가능한 작업을 기초부터 완전히 새롭게 구축하는 프레임워크인 ScaleEnv를 제안한다. 구체적으로 ScaleEnv는 절차적 테스트를 통해 환경의 신뢰성을 보장하고, 도구 의존성 그래프 확장 및 실행 가능한 행동 검증을 통해 작업의 완전성과 해결 가능성을 보장한다. 에이전트가 ScaleEnv 내에서 탐색을 통해 학습할 수 있게 함으로써, τ²-Bench 및 VitaBench와 같은 이전에 본 적 없는 다중 턴 도구 사용 벤치마크에서 상당한 성능 향상을 입증하였으며, 이는 강력한 일반화 능력을 보여준다. 더 나아가, 우리는 도메인 수의 증가와 모델 일반화 성능 간의 관계를 조사하여, 환경적 다양성을 확장하는 것이 견고한 에이전트 학습에 필수적이라는 경험적 증거를 제공한다.

Original Abstract

Training generalist agents capable of adapting to diverse scenarios requires interactive environments for self-exploration. However, interactive environments remain critically scarce, and existing synthesis methods suffer from significant limitations regarding environmental diversity and scalability. To address these challenges, we introduce ScaleEnv, a framework that constructs fully interactive environments and verifiable tasks entirely from scratch. Specifically, ScaleEnv ensures environment reliability through procedural testing, and guarantees task completeness and solvability via tool dependency graph expansion and executable action verification. By enabling agents to learn through exploration within ScaleEnv, we demonstrate significant performance improvements on unseen, multi-turn tool-use benchmarks such as $τ^2$-Bench and VitaBench, highlighting strong generalization capabilities. Furthermore, we investigate the relationship between increasing number of domains and model generalization performance, providing empirical evidence that scaling environmental diversity is critical for robust agent learning.

4 Citations

1 Influential

4 Altmetric

26.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!