2603.01025v1 Mar 01, 2026 cs.LG

추론 정확도 추정을 위한 단일 토큰 검증 방법

One-Token Verification for Reasoning Correctness Estimation

Zebin Chen

Citations: 8

h-index: 2

Zhuang Zhan

Citations: 67

h-index: 3

Feiyang Ye

Citations: 49

h-index: 5

Ying Wei

Citations: 25

h-index: 2

Xiequn Wang

Citations: 8

h-index: 2

Kede Ma

Citations: 53

h-index: 2

Yu Zhang

Citations: 29

h-index: 3

최근 대규모 언어 모델(LLM)의 발전은 복잡한 추론 작업, 특히 수학 문제 해결과 같은 분야에서 괄목할 만한 성과를 가져왔습니다. 성능 향상을 위한 일반적인 전략 중 하나는 병렬 추론으로, 여러 개의 추론 과정을 생성하고, 다수결 투표 또는 최적의 $N$개 선택과 같은 집계 방식을 사용하여 최종 예측을 수행합니다. 그러나 여전히 두 가지 주요 과제가 남아 있습니다. 첫째, 다중 샘플 디코딩은 상당한 추론 지연 시간을 발생시키며, 특히 긴 형식의 출력에서 더욱 그렇습니다. 둘째, 개별 추론 과정의 정확성을 신뢰성 있게 평가할 수 있는 효과적인 메커니즘은 여전히 제한적입니다. 이러한 과제를 해결하기 위해, 우리는 추론 과정 생성 중 단일 forward pass를 통해 추론 정확도를 추정하는 계산 방법인 One-Token Verification (OTV)을 소개합니다. OTV는 학습 가능한 토큰에 의해 활성화되며, low-rank adaptation을 통해 LLM에 통합되어 key-value cache를 통해 내부 추론 신호를 탐색하고, 생성 과정의 모든 단계에서 토큰 수준의 정확성 추정을 가능하게 하며, 주요 추론 과정을 방해하지 않습니다. 수학적 추론 벤치마크에 대한 실험 결과, OTV는 기존 검증 방법보다 우수한 성능을 지속적으로 보여줍니다. 또한, OTV는 정확도를 기반으로 한 조기 종료를 통해 토큰 사용량을 최대 90%까지 줄여, 더 짧고 신뢰할 수 있는 솔루션을 우선시합니다.

Original Abstract

Recent breakthroughs in large language models (LLMs) have led to notable successes in complex reasoning tasks, such as mathematical problem solving. A common strategy for improving performance is parallel thinking, in which multiple reasoning traces are generated and the final prediction is made using aggregation schemes like majority voting or best-of-$N$ decoding. However, two key challenges persist. First, multi-sample decoding incurs substantial inference latency, especially for long-form outputs. Second, effective mechanisms for reliably assessing the correctness of individual reasoning traces are still limited. To address these challenges, we introduce One-Token Verification (OTV), a computational method that estimates reasoning correctness in a single forward pass during generation. OTV is activated by a learnable token and integrated into the LLM via low-rank adaptation to probe internal reasoning signals through the key-value cache, supporting token-level correctness estimation at any stage of generation without disrupting primary reasoning. Experiments on mathematical reasoning benchmarks demonstrate that OTV consistently surpasses existing verifiers. Additionally, OTV reduces token usage by up to $90\%$ through correctness-guided early termination, prioritizing shorter, more reliable solutions.

0 Citations

0 Influential

2.5 Altmetric

12.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!