2604.10788v1 Apr 12, 2026 cs.CL

TInR: 대규모 언어 모델에서 도구 내재화 추론 연구

TInR: Exploring Tool-Internalized Reasoning in Large Language Models

Wenjie Li

Citations: 708

h-index: 8

Fangcheng Liu

Citations: 149

h-index: 7

Qiancheng Xu

Citations: 37

h-index: 4

Yongqing Li

Citations: 235

h-index: 6

Hongru Wang

The Chinese University of Hong Kong, University of Edinburgh

Citations: 2,219

h-index: 24

Min Yang

Citations: 65

h-index: 2

도구 통합 추론(TIR)은 대규모 언어 모델(LLM)의 추론 능력을 외부 도구와 결합하여 발전시키는 유망한 접근 방식입니다. 기존의 TIR 방법은 일반적으로 추론 과정에서 외부 도구 설명서를 활용합니다. 그러나 이는 도구 숙련의 어려움, 도구 크기 제한, 그리고 추론 효율성 저하를 야기합니다. 이러한 문제점을 해결하기 위해, 우리는 LLM 내부에 도구 지식을 내재화하여 추론을 용이하게 하는 도구 내재화 추론(TInR)을 탐구합니다. 이 목표를 달성하기 위해서는 도구 내재화와 도구-추론 조화라는 중요한 과제가 존재합니다. 이러한 과제를 해결하기 위해, 우리는 통합적인 추론과 도구 사용을 위한 도구 내재화 추론 프레임워크인 TInR-U를 제안합니다. TInR-U는 세 단계로 구성된 파이프라인을 통해 학습됩니다. 1단계는 양방향 지식 정렬 전략을 활용한 도구 내재화, 2단계는 고품질의 추론 주석을 사용한 지도 학습 사전 훈련, 그리고 3단계는 TInR에 특화된 보상을 활용한 강화 학습입니다. 우리는 TInR-U를 다양한 영역(in-domain) 및 영역 외부(out-of-domain) 환경에서 종합적으로 평가했습니다. 실험 결과는 TInR-U가 모든 환경에서 뛰어난 성능을 보이며, 그 효과성과 효율성을 입증합니다.

Original Abstract

Tool-Integrated Reasoning (TIR) has emerged as a promising direction by extending Large Language Models' (LLMs) capabilities with external tools during reasoning. Existing TIR methods typically rely on external tool documentation during reasoning. However, this leads to tool mastery difficulty, tool size constraints, and inference inefficiency. To mitigate these issues, we explore Tool-Internalized Reasoning (TInR), aiming at facilitating reasoning with tool knowledge internalized into LLMs. Achieving this goal presents notable requirements, including tool internalization and tool-reasoning coordination. To address them, we propose TInR-U, a tool-internalized reasoning framework for unified reasoning and tool usage. TInR-U is trained through a three-phase pipeline: 1) tool internalization with a bidirectional knowledge alignment strategy; 2) supervised fine-tuning warm-up using high-quality reasoning annotations, and 3) reinforcement learning with TInR-specific rewards. We comprehensively evaluate our method across in-domain and out-of-domain settings. Experiment results show that TInR-U achieves superior performance in both settings, highlighting its effectiveness and efficiency.

0 Citations

0 Influential

12 Altmetric

60.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!