2601.07123v1 Jan 12, 2026 cs.AI

ENTRA: 대규모 언어 모델 추론에서의 엔트로피 기반 중복 회피

ENTRA: Entropy-Based Redundancy Avoidance in Large Language Model Reasoning

Ruichu Cai

Citations: 83

h-index: 5

Qingwen Lin

Citations: 11

h-index: 2

Yutong Chen

Citations: 10

h-index: 1

Zijian Li

Citations: 737

h-index: 13

Boyan Xu

Citations: 346

h-index: 10

Hao Du

Citations: 26

h-index: 2

대규모 추론 모델(LRM)은 간단한 작업에 대해서도 불필요하게 긴 추론 사슬을 생성하는 '과잉 사고(overthinking)' 문제를 겪는 경우가 많습니다. 이는 주로 중복된 검증과 반복적인 생성으로 인해 성능 향상은 제한적인 반면 상당한 계산 오버헤드를 초래합니다. 기존 연구들은 일반적으로 출력 길이를 제한하거나 정확성을 최적화하는 데 집중했으나, 이러한 단순한 감독 방식은 모델이 간결하면서도 정확한 추론을 하도록 유도하는 데 한계가 있었습니다. 본 논문에서는 성능을 유지하면서 불필요한 추론을 억제하는 엔트로피 기반 훈련 프레임워크인 ENTRA를 제안합니다. ENTRA는 먼저 예측 신뢰도와 전방 영향력을 모두 고려하는 경량화된 양방향 중요도 추정(BIE) 기법을 사용하여 토큰 수준의 중요도를 산출합니다. 그 후, 중요도가 낮은 토큰들의 엔트로피를 이론적 상한선으로 정규화하여 중복성 보상을 계산하고, 강화 학습을 통해 이를 최적화합니다. 수학적 추론 벤치마크 실험 결과, ENTRA는 정확도 저하 없이(일부 경우에는 정확도 향상) 출력 길이를 37%에서 53%까지 감소시키는 것으로 입증되었습니다. 우리의 접근 방식은 LRM의 과잉 사고를 줄이기 위한 원칙적이고 효율적인 솔루션을 제공하며, 중복성을 고려한 추론 최적화를 위한 일반화 가능한 방향을 제시합니다.

Original Abstract

Large Reasoning Models (LRMs) often suffer from overthinking, generating unnecessarily long reasoning chains even for simple tasks. This leads to substantial computational overhead with limited performance gain, primarily due to redundant verification and repetitive generation. While prior work typically constrains output length or optimizes correctness, such coarse supervision fails to guide models toward concise yet accurate inference. In this paper, we propose ENTRA, an entropy-based training framework that suppresses redundant reasoning while preserving performance. ENTRA first estimates the token-level importance using a lightweight Bidirectional Importance Estimation (BIE) method, which accounts for both prediction confidence and forward influence. It then computes a redundancy reward based on the entropy of low-importance tokens, normalized by its theoretical upper bound, and optimizes this reward via reinforcement learning. Experiments on mathematical reasoning benchmarks demonstrate that ENTRA reduces output length by 37% to 53% with no loss-and in some cases, gains-in accuracy. Our approach offers a principled and efficient solution to reduce overthinking in LRMs, and provides a generalizable path toward redundancy-aware reasoning optimization.

0 Citations

0 Influential

6.5 Altmetric

32.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!