2601.04728v1 Jan 08, 2026 cs.LG

일반화 가능한 예측 모델 학습 시 과도한 설명 길이

Excess Description Length of Learning Generalizable Predictors

Jan Leike

Citations: 69,530

h-index: 30

Elizabeth Donoway

Citations: 26

h-index: 2

Hailey Joren

Citations: 52

h-index: 2

Fabien Roger

Citations: 340

h-index: 4

미세 조정(fine-tuning)이 잠재된 능력을 발휘하는 것인지, 아니면 새로운 것을 가르치는 것인지 여부를 이해하는 것은 언어 모델 평가 및 안전에 있어 근본적인 질문입니다. 본 연구에서는 미세 조정이 학습 데이터에서 얼마나 많은 예측 구조를 추출하여 모델 파라미터에 기록하는지를 정량화하는 형식적인 정보 이론적 프레임워크를 개발했습니다. 핵심 지표인 '과도한 설명 길이(Excess Description Length, EDL)'는 순차적 코딩을 통해 정의되며, 온라인 방식으로 학습된 모델을 사용하여 학습 레이블을 순차적으로 인코딩하는 데 필요한 비트 수와 최종 학습된 모델에서의 잔여 인코딩 비용 간의 차이를 측정합니다. 우리는 EDL이 기댓값에서 항상 0 이상이며, 데이터가 무한대에 가까워짐에 따라 잉여 설명 길이에 수렴하고, 예상되는 일반화 성능 향상에 대한 경계를 제공한다는 것을 증명했습니다. 다양한 간단한 모델을 통해 학습 과정에서의 정보에 대한 일반적인 오해를 명확히 했습니다. 예를 들어, 임의의 레이블이 EDL을 0에 가깝게 만드는 이유, 단일 예제가 데이터 분포를 설명하는 기본 규칙에 대한 불확실성을 줄이는 데 어떻게 기여하는지, 희귀 입력에서 학습된 구조가 예상되는 일반화에 얼마나 큰 영향을 미치는지, 그리고 형식 학습이 능력 습득과 구별되는 초기 과도 현상을 어떻게 생성하는지를 설명합니다. 이 프레임워크는 능력 발휘와 학습이 질적으로 다른 확장 특성을 보이는 경험적 관찰에 대한 엄격한 기반을 제공합니다.

Original Abstract

Understanding whether fine-tuning elicits latent capabilities or teaches new ones is a fundamental question for language model evaluation and safety. We develop a formal information-theoretic framework for quantifying how much predictive structure fine-tuning extracts from the train dataset and writes into a model's parameters. Our central quantity, Excess Description Length (EDL), is defined via prequential coding and measures the gap between the bits required to encode training labels sequentially using an evolving model (trained online) and the residual encoding cost under the final trained model. We establish that EDL is non-negative in expectation, converges to surplus description length in the infinite-data limit, and provides bounds on expected generalization gain. Through a series of toy models, we clarify common confusions about information in learning: why random labels yield EDL near zero, how a single example can eliminate many bits of uncertainty about the underlying rule(s) that describe the data distribution, why structure learned on rare inputs contributes proportionally little to expected generalization, and how format learning creates early transients distinct from capability acquisition. This framework provides rigorous foundations for the empirical observation that capability elicitation and teaching exhibit qualitatively distinct scaling signatures.

0 Citations

0 Influential

15 Altmetric

75.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!