2602.18671v1 Feb 21, 2026 cs.AI

대규모 언어 모델의 유출 에너지

Spilled Energy in Large Language Models

Adrian Robert Minut

Citations: 12

h-index: 2

Hazem Dewidar

Citations: 29

h-index: 2

Iacopo Masi

Citations: 3

h-index: 1

우리는 대규모 언어 모델(LLM)의 최종 소프트맥스 분류기를 에너지 기반 모델(EBM)로 재해석하여, 추론 과정에서 시퀀스-투-시퀀스 확률 사슬을 상호작용하는 다수의 EBM으로 분해한다. 이러한 원칙적인 접근 방식을 통해 디코딩 중 '에너지 유출(energy spills)'을 추적할 수 있으며, 우리는 이것이 사실적 오류, 편향 및 실패와 상관관계가 있음을 경험적으로 보여준다. Orgad 등(2025)의 연구와 유사하게, 우리의 방법은 정확한 답변 토큰의 위치를 찾아낸 후 환각(hallucination) 여부를 테스트한다. 그러나 중요한 점은 훈련된 프로브 분류기나 활성화 제거(activation ablation) 과정 없이 이를 달성한다는 것이다. 대신, 우리는 출력 로짓에서 직접 도출되며 훈련이 전혀 필요 없는 두 가지 지표를 도입한다. 하나는 이론적으로 일치해야 하는 연속적인 생성 단계 간의 에너지 값 불일치를 포착하는 '유출 에너지(spilled energy)'이고, 다른 하나는 단일 단계에서 측정 가능한 '주변화된 에너지(marginalized energy)'이다. 최첨단 LLM(LLaMA, Mistral, Gemma 포함) 전반에 걸친 9개의 벤치마크와 합성 대수 연산(Qwen3)에서 평가한 결과, 우리의 접근 방식은 강력하고 경쟁력 있는 환각 탐지 및 교차 작업 일반화 성능을 입증하였다. 특히, 이러한 결과는 어떠한 훈련 오버헤드도 발생시키지 않으면서 사전 훈련된 모델과 지시 조정(instruction-tuned) 모델 모두에서 유효하다.

Original Abstract

We reinterpret the final Large Language Model (LLM) softmax classifier as an Energy-Based Model (EBM), decomposing the sequence-to-sequence probability chain into multiple interacting EBMs at inference. This principled approach allows us to track "energy spills" during decoding, which we empirically show correlate with factual errors, biases, and failures. Similar to Orgad et al. (2025), our method localizes the exact answer token and subsequently tests for hallucinations. Crucially, however, we achieve this without requiring trained probe classifiers or activation ablations. Instead, we introduce two completely training-free metrics derived directly from output logits: spilled energy, which captures the discrepancy between energy values across consecutive generation steps that should theoretically match, and marginalized energy, which is measurable at a single step. Evaluated on nine benchmarks across state-of-the-art LLMs (including LLaMA, Mistral, and Gemma) and on synthetic algebraic operations (Qwen3), our approach demonstrates robust, competitive hallucination detection and cross-task generalization. Notably, these results hold for both pretrained and instruction-tuned variants without introducing any training overhead.

0 Citations

0 Influential

1 Altmetric

5.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!