2601.22139v1 Jan 29, 2026 cs.CL

질문을 통해 추론하기: 추론형 대규모 언어 모델을 수동적인 문제 해결사에서 능동적인 탐구자로 변환

Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers

Xin Chen

Citations: 64

h-index: 5

Feng Jiang

Citations: 306

h-index: 4

Yiqian Zhang

Citations: 10

h-index: 2

Hardy Chen

Citations: 52

h-index: 4

Shuo Yan

Citations: 73

h-index: 4

Wenya Xie

Citations: 3

h-index: 1

Min Yang

Citations: 3

h-index: 1

Shujian Huang

Citations: 48

h-index: 2

추론에 특화된 대규모 언어 모델(LLM)은 Chain-of-Thought(CoT) 프롬프팅을 통해 놀라운 발전을 이루었지만, 여전히 extit{맹목적인 자기 사고} 패러다임에 근본적인 한계를 가지고 있습니다. 이는 중요한 정보가 누락되거나 모호한 경우에도 광범위한 내부 추론을 수행하는 것을 의미합니다. 본 연구에서는 능동적인 상호 작용 추론(Proactive Interactive Reasoning, PIR)이라는 새로운 추론 패러다임을 제안합니다. PIR은 LLM을 수동적인 문제 해결자에서 능동적인 탐구자로 전환하여, 추론과정에서 명확성을 확보하기 위한 질문을 포함합니다. 기존의 검색 또는 도구 기반 프레임워크가 주로 외부 환경에 대한 질의를 통해 지식 불확실성을 해결하는 것과는 달리, PIR은 사용자와의 직접적인 상호 작용을 통해 전제 및 의도 수준의 불확실성을 해결합니다. PIR은 다음 두 가지 핵심 구성 요소를 통해 구현됩니다. (1) 불확실성을 인지하는 지도 학습 방법을 통해 모델에 상호 작용 추론 능력을 부여하고, (2) 사용자 시뮬레이터를 기반으로 하는 정책 최적화 프레임워크를 사용하여 복합적인 보상을 통해 모델의 행동을 사용자 의도에 맞춥니다. 수학적 추론, 코드 생성 및 문서 편집에 대한 광범위한 실험 결과, PIR은 강력한 기준 모델보다 일관되게 뛰어난 성능을 보이며, 정확도가 최대 32.70% 향상되고, 합격률이 22.90% 향상되며, BLEU 점수가 41.36% 향상되는 것을 확인했습니다. 또한, 추론 연산 및 불필요한 상호 작용 횟수가 거의 절반으로 감소했습니다. 사실 지식, 질문 답변 및 누락된 전제 시나리오에 대한 추가적인 신뢰성 평가 결과, PIR의 강력한 일반화 및 안정성을 확인했습니다. 모델 및 코드는 다음 링크에서 공개적으로 이용할 수 있습니다: [https://github.com/SUAT-AIRI/Proactive-Interactive-R1](https://github.com/SUAT-AIRI/Proactive-Interactive-R1)

Original Abstract

Reasoning-oriented Large Language Models (LLMs) have achieved remarkable progress with Chain-of-Thought (CoT) prompting, yet they remain fundamentally limited by a \emph{blind self-thinking} paradigm: performing extensive internal reasoning even when critical information is missing or ambiguous. We propose Proactive Interactive Reasoning (PIR), a new reasoning paradigm that transforms LLMs from passive solvers into proactive inquirers that interleave reasoning with clarification. Unlike existing search- or tool-based frameworks that primarily address knowledge uncertainty by querying external environments, PIR targets premise- and intent-level uncertainty through direct interaction with the user. PIR is implemented via two core components: (1) an uncertainty-aware supervised fine-tuning procedure that equips models with interactive reasoning capability, and (2) a user-simulator-based policy optimization framework driven by a composite reward that aligns model behavior with user intent. Extensive experiments on mathematical reasoning, code generation, and document editing demonstrate that PIR consistently outperforms strong baselines, achieving up to 32.70\% higher accuracy, 22.90\% higher pass rate, and 41.36 BLEU improvement, while reducing nearly half of the reasoning computation and unnecessary interaction turns. Further reliability evaluations on factual knowledge, question answering, and missing-premise scenarios confirm the strong generalization and robustness of PIR. Model and code are publicly available at: \href{https://github.com/SUAT-AIRI/Proactive-Interactive-R1}

0 Citations

0 Influential

34.489476363992 Altmetric

172.4 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!