2602.14451v1 Feb 16, 2026 cs.AI

선례 기반 추론: 테스트 시점 선례 학습을 통한 거대 추론 모델의 과도한 사고 완화

Precedent-Informed Reasoning: Mitigating Overthinking in Large Reasoning Models via Test-Time Precedent Learning

Qianyue Wang

Citations: 29

h-index: 2

Jinwu Hu

Citations: 45

h-index: 4

Huanxiang Lin

Citations: 3

h-index: 1

Bolin Chen

Citations: 0

h-index: 0

Z. Wen

Citations: 627

h-index: 7

Yaofo Chen

Citations: 1,208

h-index: 8

Yu Rong

Citations: 241

h-index: 8

Mingkui Tan

Citations: 48

h-index: 4

거대 언어 모델(LLM)의 추론 과정은 종종 불필요한 자가 탐색 및 검증을 포함한 비효율적으로 긴 사고 사슬(chain-of-thought)로 인해 어려움을 겪으며, 이는 계산 비용을 증가시키고 성능을 저하시키기도 한다. 과거의 관련 사례를 활용하여 탐색 공간을 제한하고 시행착오를 줄임으로써 새로운 문제를 해결하는 인간의 추론 패턴에서 영감을 받아, 본 논문에서는 거대 추론 모델(LRM)의 추론 패러다임을 소모적인 자가 탐색에서 선례를 통한 유도 학습으로 전환하는 선례 기반 추론(Precedent Informed Reasoning, PIR)을 제안한다. PIR은 어떤 선례를 채택할 것인지, 그리고 이를 어떻게 활용할 것인지에 대한 두 가지 핵심 과제를 해결한다. 첫째, 적응형 선례 선택(Adaptive Precedent Selection, APS)은 각 질문과 LRM에 대해 의미적으로 관련성이 높으면서도 모델에 유익한 소규모의 선례 집합을 구성한다. 이는 의미적 유사도와 모델 펄플렉시티(perplexity)를 결합한 점수로 예시의 순위를 매긴 후, 펄플렉시티 감소를 극대화하도록 선례의 양을 조절한다. 둘째, 테스트 시점 경험 내재화(Test-time Experience Internalization, TEI)는 선례 기반 지시문에 대한 테스트 시점 학습을 수행하여, 경량 어댑터를 업데이트함으로써 해결 패턴을 내재화하고 이를 후속 추론의 사전 정보(prior)로 활용한다. 수학적 추론, 과학 질의응답, 코드 생성 분야의 실험을 통해, PIR이 다양한 LLM에서 최종 정확도를 유지하거나 향상시키면서도 추론 과정을 일관되게 단축시켜 뛰어난 정확도-효율성 트레이드오프를 달성함을 입증하였다.

Original Abstract

Reasoning in Large Language Models (LLMs) often suffers from inefficient long chain-of-thought traces with redundant self-exploration and validation, which inflate computational costs and even degrade performance. Inspired by human reasoning patterns where people solve new problems by leveraging past related cases to constrain search spaces and reduce trial-and-error, we propose Precedent Informed Reasoning (PIR) transforming LRMs'reasoning paradigm from exhaustive self-exploration to guided learning from precedents. PIR addresses two key challenges: what precedents to adopt and how to utilize them. First, Adaptive Precedent Selection (APS) constructs, for each question and LRM, a compact set of precedents that are both semantically related and informative for the model. It ranks examples by a joint score with semantic similarity and model perplexity, then adapts the amount of precedents to maximize perplexity reduction. Second, Test-time Experience Internalization (TEI) is treated as the test-time learning on precedent-informed instruction, updating lightweight adapters to internalize solution patterns and use them as a prior during subsequent reasoning. Experiments across mathematical reasoning, scientific QA, and code generation demonstrate that PIR consistently shortens reasoning traces while maintaining or improving final accuracy across LLMs, yielding outstanding accuracy-efficiency trade-offs.

0 Citations

0 Influential

4 Altmetric

20.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!