2604.06636v1 Apr 08, 2026 cs.LG

SHAPE: 잠재력 추정을 통한 단계 인지 계층적 이점을 활용한 LLM 추론

SHAPE: Stage-aware Hierarchical Advantage via Potential Estimation for LLM Reasoning

Pinyan Lu

Citations: 19

h-index: 3

Zhengyang Ai

Citations: 18

h-index: 3

Zikang Shan

Citations: 214

h-index: 2

Xiaodong Ai

Citations: 13

h-index: 3

Jingxian Tang

Citations: 51

h-index: 3

Hangkai Hu

Citations: 11

h-index: 2

프로세스 감시는 LLM 추론 능력을 향상시키는 유망한 접근 방식으로 부상했지만, 기존 방법들은 의미 있는 발전과 단순한 장황함을 구별하지 못하여 추론 능력이 제한되고 토큰 효율성이 해결되지 않는 문제가 있습니다. 이를 해결하기 위해, 우리는 잠재력 추정을 통한 단계 인지 계층적 이점을 활용하는 프레임워크인 SHAPE(Stage-aware Hierarchical Advantage via Potential Estimation)을 제안합니다. SHAPE는 추론을 경험적 해결 가능성의 상태 공간을 통한 경로로 형식화합니다. SHAPE는 계층적 신용 할당 메커니즘을 도입합니다. 세그먼트 수준에서는 단계 인지 이점 함수를 사용하여 저잠재력 상태에서 효율적인 혁신을 우선시하고, 토큰 수준에서는 엔트로피 기반 재분배를 활용하여 실행 신호를 강화합니다. 세 가지 기본 모델과 다섯 가지 벤치마크에서 수행된 광범위한 실험 결과, SHAPE는 평균적으로 3%의 정확도 향상을 달성하고 토큰 사용량을 30% 줄였습니다.

Original Abstract

Process supervision has emerged as a promising approach for enhancing LLM reasoning, yet existing methods fail to distinguish meaningful progress from mere verbosity, leading to limited reasoning capabilities and unresolved token inefficiency. To address this, we propose Stage-aware Hierarchical Advantage via Potential Estimation (SHAPE), a framework that formalizes reasoning as a trajectory through a state space of empirical solvability. SHAPE introduces a hierarchical credit assignment mechanism: at the segment level, it employs a stage-aware advantage function to prioritize efficient breakthroughs in low-potential states; at the token level, it utilizes entropy-driven redistribution to sharpen execution signals. Extensive experiments in math reasoning across three base models and five benchmarks demonstrate that SHAPE achieves an average accuracy gain of 3% with 30% reduced token consumption.

3 Citations

0 Influential

1.5 Altmetric

10.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!