2601.11517v1 Jan 16, 2026 cs.CL

대규모 추론 모델에서 설명은 일반화될 수 있는가?

Do explanations generalize across large reasoning models?

Chandan Singh

Citations: 30

h-index: 4

Koyena Pal

Citations: 289

h-index: 6

David Bau

Citations: 427

h-index: 6

대규모 추론 모델(LRM)은 문제 해결 과정에서 사고 과정을 텍스트 형태로 제시하며, 이는 인간이 이해하기 쉬운 자연어 설명을 제공함으로써 문제 이해에 잠재적으로 강력한 도구가 될 수 있습니다. 그러나 이러한 설명이 실제로 일반화되는지, 즉 LRM에 특수한 패턴을 포착하는 것이 아니라 근본적인 문제에 대한 일반적인 패턴을 포착하는지 여부는 불확실합니다. 이는 AI를 활용한 과학 연구 등 새로운 개념을 이해하거나 발견하는 데 있어 매우 중요한 질문입니다. 본 연구에서는 일반화 가능성의 특정 개념, 즉 한 LRM이 생성한 설명이 다른 LRM에 제공되었을 때 동일한 동작을 유발하는지 여부를 평가하여 이 질문을 연구합니다. 연구 결과, 사고 과정 설명은 종종 이러한 형태의 일반화 특성을 보이며(즉, LRM 간의 일관성을 높입니다) 이러한 증가된 일반화는 인간의 선호도 순위와 강화 학습을 통한 추가 훈련과 상관관계가 있음을 확인했습니다. 또한, 설명이 일관된 답변을 제공하는 조건에 대한 분석을 진행하고, 일관성을 향상시키는 간단한 문장 수준의 앙상블 전략을 제안했습니다. 종합적으로 볼 때, 본 연구 결과는 LRM 설명을 활용하여 새로운 통찰력을 얻을 때 주의가 필요하다는 점을 시사하며, LRM 설명의 일반화 특성을 분석하기 위한 프레임워크를 제시합니다.

Original Abstract

Large reasoning models (LRMs) produce a textual chain of thought (CoT) in the process of solving a problem, which serves as a potentially powerful tool to understand the problem by surfacing a human-readable, natural-language explanation. However, it is unclear whether these explanations generalize, i.e. whether they capture general patterns about the underlying problem rather than patterns which are esoteric to the LRM. This is a crucial question in understanding or discovering new concepts, e.g. in AI for science. We study this generalization question by evaluating a specific notion of generalizability: whether explanations produced by one LRM induce the same behavior when given to other LRMs. We find that CoT explanations often exhibit this form of generalization (i.e. they increase consistency between LRMs) and that this increased generalization is correlated with human preference rankings and post-training with reinforcement learning. We further analyze the conditions under which explanations yield consistent answers and propose a straightforward, sentence-level ensembling strategy that improves consistency. Taken together, these results prescribe caution when using LRM explanations to yield new insights and outline a framework for characterizing LRM explanation generalization.

5 Citations

0 Influential

3 Altmetric

20.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!