2602.00315v2 Jan 30, 2026 cs.LG

손실 곡선 너머: 확장 법칙, 능동 학습, 그리고 완벽한 사후 분포로부터의 학습 한계

Beyond the Loss Curve: Scaling Laws, Active Learning, and the Limits of Learning from Exact Posteriors

Ravid Shwartz-Ziv

Citations: 4,627

h-index: 19

Arian Khorasani

Citations: 267

h-index: 3

N. Chen

Citations: 200

h-index: 2

Yug D Oswal

Vellore Institute of Technology

Citations: 4

h-index: 1

Akshat Santhana Gopalan

Citations: 0

h-index: 0

E. Kolemen

Citations: 16

h-index: 2

신경망은 궁극적으로 도달할 수 있는 최고의 성능에 얼마나 가까운가? 표준 벤치마크는 실제 사후 분포 p(y|x)에 접근할 수 없기 때문에 이 질문에 답할 수 없습니다. 우리는 현실적인 이미지(AFHQ, ImageNet)에서 완벽한 사후 분포를 계산 가능하게 만드는 클래스 조건부 정규화 흐름을 오라클로 사용합니다. 이를 통해 다음과 같은 다섯 가지 연구 방향을 탐구할 수 있습니다. 확장 법칙: 예측 오류는 줄일 수 없는 확률적 불확실성과 줄일 수 있는 인식적 오류로 분해되며, 인식적 구성 요소는 데이터 세트 크기에 따라 거듭제곱 법칙을 따르며, 전체 손실이 정체되어도 계속 감소합니다. 학습의 한계: 확률적 하한은 정확하게 측정할 수 있으며, 아키텍처는 이를 어떻게 접근하는지에 따라 크게 다릅니다. ResNet은 깨끗한 거듭제곱 법칙 스케일링을 보이는 반면, Vision Transformer는 데이터가 부족한 환경에서 성능이 저하됩니다. 소프트 라벨: 오라클 사후 분포는 클래스 라벨을 넘어 학습 가능한 구조를 포함하고 있으며, 완벽한 사후 분포로 훈련하는 것이 하드 라벨을 사용하는 것보다 성능이 뛰어나고 거의 완벽한 교정 성능을 제공합니다. 분포 변화: 오라클은 제어된 변화의 정확한 KL 발산을 계산하며, 이는 변화의 크기보다 변화의 유형이 더 중요하다는 것을 보여줍니다. 예를 들어, 클래스 불균형은 입력 노이즈로 인해 심각한 성능 저하가 발생하는 발산 값에서 정확도에 거의 영향을 미치지 않습니다. 능동 학습: 완벽한 인식적 불확실성은 진정으로 유용한 샘플과 본질적으로 모호한 샘플을 구별하여 샘플 효율성을 향상시킵니다. 우리의 프레임워크는 표준 지표가 지속적인 학습을 숨기고, 아키텍처 간의 차이를 가리고, 분포 변화의 본질을 진단할 수 없다는 것을 보여줍니다.

Original Abstract

How close are neural networks to the best they could possibly do? Standard benchmarks cannot answer this because they lack access to the true posterior p(y|x). We use class-conditional normalizing flows as oracles that make exact posteriors tractable on realistic images (AFHQ, ImageNet). This enables five lines of investigation. Scaling laws: Prediction error decomposes into irreducible aleatoric uncertainty and reducible epistemic error; the epistemic component follows a power law in dataset size, continuing to shrink even when total loss plateaus. Limits of learning: The aleatoric floor is exactly measurable, and architectures differ markedly in how they approach it: ResNets exhibit clean power-law scaling while Vision Transformers stall in low-data regimes. Soft labels: Oracle posteriors contain learnable structure beyond class labels: training with exact posteriors outperforms hard labels and yields near-perfect calibration. Distribution shift: The oracle computes exact KL divergence of controlled perturbations, revealing that shift type matters more than shift magnitude: class imbalance barely affects accuracy at divergence values where input noise causes catastrophic degradation. Active learning: Exact epistemic uncertainty distinguishes genuinely informative samples from inherently ambiguous ones, improving sample efficiency. Our framework reveals that standard metrics hide ongoing learning, mask architectural differences, and cannot diagnose the nature of distribution shift.

0 Citations

0 Influential

9.5 Altmetric

47.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!