2602.07883v1 Feb 08, 2026 cs.AI

ToolSelf: 도구 주도 내재적 적응을 통한 과제 수행과 자기 재구성의 통합

ToolSelf: Unifying Task Execution and Self-Reconfiguration via Tool-Driven Intrinsic Adaptation

Yujia Liu

Citations: 0

h-index: 0

Sheng Wang

Citations: 144

h-index: 8

Junwei Su

Citations: 110

h-index: 5

Qintong Li

Citations: 355

h-index: 4

Jiahui Gao

Citations: 1,868

h-index: 17

Lingpeng Kong

Citations: 83

h-index: 5

Jingqi Zhou

Citations: 13

h-index: 2

Junwen Lu

Citations: 56

h-index: 5

Jiyu Jiang

Citations: 9

h-index: 2

Dunhong Jin

Citations: 2

h-index: 1

Chuan Wu

Citations: 197

h-index: 8

Dezhao Deng

Citations: 106

h-index: 2

대규모 언어 모델(LLM)을 기반으로 하는 에이전트 시스템은 복잡하고 장기적인 과제를 해결하는 데 있어 놀라운 잠재력을 보여주었습니다. 그러나 실행 전에 고정되어 변화하는 과제 상황에 적응하지 못하는, 에이전트 행동을 제어하는 정적인 구성으로 인해 그 효율성이 근본적으로 제한됩니다. 수동 조정이나 휴리스틱 기반 패치에 의존하는 기존 접근 방식은 종종 낮은 일반화 성능과 파편화된 최적화 문제로 어려움을 겪습니다. 이러한 한계를 극복하기 위해, 우리는 도구 주도 런타임 자기 재구성을 가능하게 하는 새로운 패러다임인 ToolSelf를 제안합니다. 구성 업데이트를 호출 가능한 도구로 추상화함으로써, ToolSelf는 과제 수행과 자기 조정을 단일 행동 공간으로 통합하여 외부 규칙에서 내재적 매개변수로의 전환을 달성합니다. 이를 통해 에이전트는 과제 진행 상황에 따라 하위 목표와 맥락을 자율적으로 업데이트하고 이에 맞춰 전략과 도구 구성을 조정함으로써, 수동적인 실행자에서 과제와 자기 자신을 모두 관리하는 이중 관리자로 변모할 수 있습니다. 또한 우리는 이러한 메타 능력을 내재화하기 위해 기각 샘플링 미세 조정과 궤적 수준 강화 학습을 결합한 구성 인식 2단계 훈련(CAT)을 고안했습니다. 다양한 벤치마크에 걸친 광범위한 실험을 통해 ToolSelf가 전문화된 워크플로에 필적하는 성능을 보이면서도 새로운 과제에 대해 일반화될 수 있음을 입증했으며, 평균 24.1%의 성능 향상을 달성하여 진정한 자기 적응형 에이전트로 나아가는 길을 제시했습니다.

Original Abstract

Agentic systems powered by Large Language Models (LLMs) have demonstrated remarkable potential in tackling complex, long-horizon tasks. However, their efficacy is fundamentally constrained by static configurations governing agent behaviors, which are fixed prior to execution and fail to adapt to evolving task dynamics. Existing approaches, relying on manual orchestration or heuristic-based patches, often struggle with poor generalization and fragmented optimization. To transcend these limitations, we propose ToolSelf, a novel paradigm enabling tool-driven runtime self-reconfiguration. By abstracting configuration updates as a callable tool, ToolSelf unifies task execution and self-adjustment into a single action space, achieving a phase transition from external rules to intrinsic parameters. Agents can thereby autonomously update their sub-goals and context based on task progression, and correspondingly adapt their strategy and toolbox, transforming from passive executors into dual managers of both task and self. We further devise Configuration-Aware Two-stage Training (CAT), combining rejection sampling fine-tuning with trajectory-level reinforcement learning to internalize this meta-capability. Extensive experiments across diverse benchmarks demonstrate that ToolSelf rivals specialized workflows while generalizing to novel tasks, achieving a 24.1% average performance gain and illuminating a path toward truly self-adaptive agents.

0 Citations

0 Influential

8.5 Altmetric

42.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!