2601.05808v1 Jan 09, 2026 cs.CL

EnvScaler: 프로그램 기반 합성 기반의 LLM 에이전트를 위한 확장 가능한 도구 상호 작용 환경

EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis

Guanting Dong

Citations: 972

h-index: 11

Yutao Zhu

University of Montreal

Citations: 4,486

h-index: 29

Xiaoshuai Song

Renmin University of China

Citations: 442

h-index: 9

Zhicheng Dou

Renmin University of China

Citations: 6,905

h-index: 41

Ji-Rong Wen

Citations: 1,867

h-index: 15

Hao Chang

Citations: 18

h-index: 2

대규모 언어 모델(LLM)은 다양한 실제 환경에서 에이전트로 작동하도록 훈련될 것으로 예상되지만, 이는 풍부하고 다양한 도구 상호 작용 테스트 환경에 의존합니다. 그러나 실제 시스템에 대한 접근은 종종 제한되며, LLM 시뮬레이션 환경은 환각 및 불일치 문제가 발생하기 쉽고, 수동으로 구축된 테스트 환경은 확장이 어렵습니다. 본 논문에서는 프로그램 기반 합성을 통해 확장 가능한 도구 상호 작용 환경을 위한 자동화된 프레임워크인 EnvScaler를 제안합니다. EnvScaler는 두 가지 구성 요소로 구성됩니다. 첫째, SkelBuilder는 토픽 마이닝, 논리 모델링 및 품질 평가를 통해 다양한 환경 골격을 구축합니다. 둘째, ScenGenerator는 각 환경에 대한 여러 가지 작업 시나리오와 규칙 기반 트래jectory 검증 함수를 생성합니다. EnvScaler를 사용하여 191개의 환경과 약 7,000개의 시나리오를 합성하고, 이를 Qwen3 시리즈 모델의 지도 학습(SFT) 및 강화 학습(RL)에 적용했습니다. 세 가지 벤치마크에서 얻은 결과는 EnvScaler가 LLM이 복잡한 환경에서 다단계, 다중 도구 상호 작용을 포함하는 작업을 해결하는 능력을 크게 향상시킨다는 것을 보여줍니다. 저희는 코드와 데이터를 다음 주소에서 공개합니다: https://github.com/RUC-NLPIR/EnvScaler.

Original Abstract

Large language models (LLMs) are expected to be trained to act as agents in various real-world environments, but this process relies on rich and varied tool-interaction sandboxes. However, access to real systems is often restricted; LLM-simulated environments are prone to hallucinations and inconsistencies; and manually built sandboxes are hard to scale. In this paper, we propose EnvScaler, an automated framework for scalable tool-interaction environments via programmatic synthesis. EnvScaler comprises two components. First, SkelBuilder constructs diverse environment skeletons through topic mining, logic modeling, and quality evaluation. Then, ScenGenerator generates multiple task scenarios and rule-based trajectory validation functions for each environment. With EnvScaler, we synthesize 191 environments and about 7K scenarios, and apply them to Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) for Qwen3 series models. Results on three benchmarks show that EnvScaler significantly improves LLMs' ability to solve tasks in complex environments involving multi-turn, multi-tool interactions. We release our code and data at https://github.com/RUC-NLPIR/EnvScaler.

11 Citations

2 Influential

63.475599250673 Altmetric

332.4 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!