2602.06527v1 Feb 06, 2026 cs.AI

HyPER: 가설 경로 확장 및 축소를 통한 확장 가능한 LLM 추론을 위한 탐색과 활용의 조화

HyPER: Bridging Exploration and Exploitation for Scalable LLM Reasoning with Hypothesis Path Expansion and Reduction

Shengxuan Qiu

Citations: 0

h-index: 0

Haochen Huang

Citations: 4

h-index: 1

Shuzhang Zhong

Citations: 93

h-index: 4

Pengfei Zuo

Citations: 279

h-index: 6

Meng Li

Citations: 40

h-index: 3

다중 경로 생각의 사슬(Chain-of-Thought)을 통해 테스트 시간 연산을 확장하면 추론 정확도가 향상되지만, 그 효과는 탐색과 활용(exploration-exploitation) 간의 균형에 결정적으로 좌우됩니다. 기존 접근법은 이 균형을 경직된 방식으로 다룹니다. 트리 구조 탐색은 사후 학습된 추론을 방해하는 불안정한 확장 규칙을 통해 탐색을 고정시키며, 병렬 추론은 중복된 가설 경로를 과도하게 탐색하고 취약한 정답 선택 방식에 의존합니다. 최적의 균형은 단계에 따라 달라지며 정답과 오답 추론 경로는 종종 후반 단계에서야 갈라진다는 관찰에 기반하여, 우리는 테스트 시간 확장을 가설 풀(pool)에 대한 동적 확장-축소 제어 문제로 재정의합니다. 우리는 경량 경로 통계를 사용하여 고정된 예산 내에서 연산을 재할당하는, 전문가 혼합(MoE) 모델의 다중 경로 디코딩을 위한 훈련이 필요 없는 온라인 제어 정책인 HyPER를 제안합니다. HyPER는 가설 풀이 진화함에 따라 탐색에서 활용으로 전환하는 온라인 제어기, 전체 경로를 다시 샘플링하지 않고도 효율적인 생성 단계 활용을 가능하게 하는 토큰 수준 정제 메커니즘, 그리고 신뢰할 수 있는 정답 도출 단계 활용을 위한 길이 및 신뢰도 기반 집계 전략으로 구성됩니다. 다양한 추론 벤치마크에서 4가지 전문가 혼합 언어 모델을 대상으로 실험한 결과, HyPER는 일관되게 우수한 정확도 대 연산 효율성을 달성하였으며, 토큰 사용량을 25~40% 줄이면서도 정확도를 8~10% 향상시켰습니다.

Original Abstract

Scaling test-time compute with multi-path chain-of-thought improves reasoning accuracy, but its effectiveness depends critically on the exploration-exploitation trade-off. Existing approaches address this trade-off in rigid ways: tree-structured search hard-codes exploration through brittle expansion rules that interfere with post-trained reasoning, while parallel reasoning over-explores redundant hypothesis paths and relies on weak answer selection. Motivated by the observation that the optimal balance is phase-dependent and that correct and incorrect reasoning paths often diverge only at late stages, we reformulate test-time scaling as a dynamic expand-reduce control problem over a pool of hypotheses. We propose HyPER, a training-free online control policy for multi-path decoding in mixture-of-experts models that reallocates computation under a fixed budget using lightweight path statistics. HyPER consists of an online controller that transitions from exploration to exploitation as the hypothesis pool evolves, a token-level refinement mechanism that enables efficient generation-time exploitation without full-path resampling, and a length- and confidence-aware aggregation strategy for reliable answer-time exploitation. Experiments on four mixture-of-experts language models across diverse reasoning benchmarks show that HyPER consistently achieves a superior accuracy-compute trade-off, improving accuracy by 8 to 10 percent while reducing token usage by 25 to 40 percent.

0 Citations

0 Influential

3 Altmetric

15.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!