2601.16206v2 Jan 22, 2026 cs.CL

LLM-in-Sandbox: 코딩 환경을 활용하여 일반적인 에이전트 지능을 구현

LLM-in-Sandbox Elicits General Agentic Intelligence

Ji-Rong Wen

Citations: 1,514

h-index: 19

Furu Wei

Citations: 249

h-index: 8

Li Dong

Citations: 645

h-index: 11

Daixuan Cheng

Citations: 900

h-index: 11

Shaohan Huang

Citations: 482

h-index: 7

Yu Gu

Citations: 41

h-index: 3

Huatong Song

Citations: 666

h-index: 7

Guoxin Chen

Citations: 196

h-index: 8

W. Zhao

Citations: 300

h-index: 4

본 논문에서는 LLM-in-Sandbox라는 새로운 방법을 제시합니다. 이는 LLM이 코드 샌드박스(가상 컴퓨터) 내에서 탐색하도록 하여, 코딩과 관련된 영역이 아닌 다양한 분야에서 일반적인 지능을 발휘할 수 있도록 합니다. 먼저, 추가적인 훈련 없이도 강력한 LLM이 코드 샌드박스를 활용하여 비코딩 작업에서 일반화 능력을 보여준다는 것을 입증했습니다. 예를 들어, LLM은 새로운 지식을 습득하기 위해 외부 리소스에 자발적으로 접근하고, 긴 문맥을 처리하기 위해 파일 시스템을 활용하며, 서식 요구 사항을 충족하기 위해 스크립트를 실행합니다. 또한, LLM-in-Sandbox 강화 학습(LLM-in-Sandbox-RL)을 통해 이러한 에이전트 기능을 더욱 향상시킬 수 있으며, 이는 에이전트 기능이 없는 데이터만을 사용하여 샌드박스 탐색 모델을 훈련시키는 방식입니다. 실험 결과, LLM-in-Sandbox는 훈련 없이 사용하거나 추가 훈련을 거친 경우에도 수학, 물리학, 화학, 생의학, 긴 문맥 이해, 그리고 지시사항 준수 등 다양한 분야에서 뛰어난 일반화 능력을 보여줍니다. 마지막으로, LLM-in-Sandbox의 효율성을 계산 및 시스템 관점에서 분석하고, 실제 적용을 용이하게 하기 위해 Python 패키지로 공개합니다.

Original Abstract

We introduce LLM-in-Sandbox, enabling LLMs to explore within a code sandbox (i.e., a virtual computer), to elicit general intelligence in non-code domains. We first demonstrate that strong LLMs, without additional training, exhibit generalization capabilities to leverage the code sandbox for non-code tasks. For example, LLMs spontaneously access external resources to acquire new knowledge, leverage the file system to handle long contexts, and execute scripts to satisfy formatting requirements. We further show that these agentic capabilities can be enhanced through LLM-in-Sandbox Reinforcement Learning (LLM-in-Sandbox-RL), which uses only non-agentic data to train models for sandbox exploration. Experiments demonstrate that LLM-in-Sandbox, in both training-free and post-trained settings, achieves robust generalization spanning mathematics, physics, chemistry, biomedicine, long-context understanding, and instruction following. Finally, we analyze LLM-in-Sandbox's efficiency from computational and system perspectives, and open-source it as a Python package to facilitate real-world deployment.

3 Citations

0 Influential

9.5 Altmetric

50.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!