2604.10727v1 Apr 12, 2026 stat.ML

RLHF 및 SGLD를 위한 꼬리 정보 이론적 일반화

Tail-Aware Information-Theoretic Generalization for RLHF and SGLD

Wan Tian

Citations: 18

h-index: 3

Huiming Zhang

Citations: 12

h-index: 2

Binghang Li

Citations: 1

h-index: 1

Qiang Sun

Citations: 0

h-index: 0

전통적인 정보 이론적 일반화 경계는 일반적으로 KL 기반 상호 정보(mutual information)를 통해 일반화 격차를 제어하며, 따라서 모멘트 생성 함수(MGF)를 통해 유한하거나 부분 가우시안 꼬리를 가정합니다. 그러나 로버스트 학습, RLHF 및 확률적 최적화와 같은 많은 현대 파이프라인에서 손실 및 보상은 꼬리가 두꺼운 분포를 가질 수 있으며, MGF가 존재하지 않을 수 있으므로 KL 기반 도구가 효과적이지 않을 수 있습니다. 본 연구에서는 꼬리 매개변수 θ가 꼬리의 두께를 제어하는 서브-베이블(sub-Weibull) 데이터에 대한 꼬리 의존적 정보 이론적 프레임워크를 개발합니다. 여기서 θ = 2는 부분 가우시안 분포, θ = 1은 부분 지수 분포, 0 < θ < 1은 진정으로 두꺼운 꼬리를 나타냅니다. 핵심 기술적 요소는 측정 변화 기대값을 시프트된 로그 fθ-발산(divergence)을 사용하여 경계하는 상관 제거 보조정리(decorrelation lemma)이며, 이를 통해 MGF 논쟁 없이 르네이(Rényi) 발산과의 명시적인 비교가 가능합니다. 경험적 공정 측면에서, 우리는 꼬리 지수 θ를 갖는 서브-베이블 공정에 대한 날카로운 최대 부등식과 듀들리(Dudley) 유형의 체인(chaining) 경계를 확립했으며, 복잡도는 log^(1/θ) 및 엔트로피^(1/θ)에 따라 증가합니다. 이러한 도구는 기대값 및 고 확률 PAC-Bayes 일반화 경계를 제공하며, 다중 스케일 르네이 상호 정보에 기반한 정보 이론적 체인 부등식을 제공합니다. 본 연구는 두꺼운 꼬리를 갖는 보상을 사용하는 르네이 정규화 RLHF 및 두꺼운 꼬리를 갖는 기울기 노이즈를 사용하는 확률적 기울기 Langevin 동역학에서 이러한 결과를 보여줍니다.

Original Abstract

Classical information-theoretic generalization bounds typically control the generalization gap through KL-based mutual information and therefore rely on boundedness or sub-Gaussian tails via the moment generating function (MGF). In many modern pipelines, such as robust learning, RLHF, and stochastic optimization, losses and rewards can be heavy-tailed, and MGFs may not exist, rendering KL-based tools ineffective. We develop a tail-dependent information-theoretic framework for sub-Weibull data, where the tail parameter $θ$ controls the tail heaviness: $θ=2$ corresponds to sub-Gaussian, $θ=1$ to sub-exponential, and $0<θ<1$ to genuinely heavy tails. Our key technical ingredient is a decorrelation lemma that bounds change-of-measure expectations using a shifted-log $f_θ$-divergence, which admits explicit comparisons to Rényi divergence without MGF arguments. On the empirical-process side, we establish sharp maximal inequalities and a Dudley-type chaining bound for sub-Weibull processes with tail index $θ$, with complexity scaling as $\log^{1/θ}$ and entropy$^{1/θ}$. These tools yield expected and high-probability PAC-Bayes generalization bounds, as well as an information-theoretic chaining inequality based on multiscale Rényi mutual information. We illustrate the consequences in Rényi-regularized RLHF under heavy-tailed rewards and in stochastic gradient Langevin dynamics with heavy-tailed gradient noise.

0 Citations

0 Influential

1.5 Altmetric

7.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!