2605.02427v1 May 04, 2026 cs.AI

모델은 알고, 디코더는 찾는다: 미래 가치 지향 입자 기반 확률 샘플링

The Model Knows, the Decoder Finds: Future Value Guided Particle Power Sampling

Rasul Tutunov

Citations: 781

h-index: 14

Matthieu Zimmer

Citations: 70

h-index: 5

Xiaotong Ji

Citations: 39

h-index: 4

H. Ammar

Citations: 867

h-index: 10

Tu Nguyen

Citations: 1,647

h-index: 15

“학습 없이 추론”에서 반복적으로 나타나는 경향은, 기본 LLM(대규모 언어 모델)이 이미 올바른 다단계 솔루션에 상당한 확률 값을 부여한다는 것입니다. 문제는 추론 시 이러한 패턴을 효율적으로 찾는 데 있습니다. 파워 샘플링은 p_theta(x)^alpha (alpha > 1)를 목표로 디코딩을 이러한 패턴 쪽으로 편향시키는 체계적인 방법을 제공하지만, 실제적인 근사화는 어떤 접두사가 유망한지 결정하는 미래 의존적 보정 요소를 고려해야 합니다. 본 논문에서는 시퀀스 레벨의 파워 타겟을 경계된 수의 부분 솔루션 집합을 사용하여 근사하는 블록 기반 입자 알고리즘인 Auxiliary Particle Power Sampling (APPS)을 소개합니다. APPS는 제안 수정 기반의 파워 재가중치를 사용하여 가설을 병렬로 전파하고, 리샘플링 경계에서 미래 가치 기반 선택을 통해 생존 가능성을 개선합니다. 이는 단일 경로를 고수하는 대신, 제한된 컴퓨팅 자원을 경쟁 접두사 전체에 재분배하며, 입자 수를 직접 조절하고 예측 가능한 최대 메모리를 제공합니다. 우리는 미래 가치 신호를 단기 예측을 통해 구현하고, 또한 예측 대신 경량 학습 선택 헤드를 사용하는 간편화된 변형을 연구합니다. 다양한 추론 벤치마크에서 APPS는 학습 없이 수행되는 디코딩의 정확도-실행 시간 균형을 개선하며, 훈련된 시스템과의 격차를 더 정확한 추론 시간 파워 근사를 통해 일부 해소할 수 있음을 시사합니다.

Original Abstract

A recurring pattern in "reasoning without training" is that base LLMs already assign non-trivial probability mass to correct multi-step solutions; the bottleneck is locating these modes efficiently at inference time. Power sampling provides a principled way to bias decoding toward such modes by targeting p_theta(x)^alpha with alpha > 1, but practical approximations must account for future-dependent correction factors that determine which prefixes remain promising. We introduce Auxiliary Particle Power Sampling (APPS), a blockwise particle algorithm for approximating the sequence-level power target with a bounded population of partial solutions. APPS propagates hypotheses in parallel using proposal-corrected power reweighting and refines their survival through future-value-guided selection at resampling boundaries. This redistributes finite compute across competing prefixes rather than committing to a single unfolding path, while providing a direct scaling knob in the particle count and predictable peak memory. We instantiate the future-value signal with short-horizon rollouts and also study an amortized variant that replaces rollouts with a lightweight learned selection head. Across reasoning benchmarks, APPS improves the accuracy-runtime trade-off of training-free decoding and suggests that part of the gap to post-trained systems can be recovered through more faithful inference-time power approximation.

1 Citations

1 Influential

7.5 Altmetric

40.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!