2604.22951v1 Apr 24, 2026 cs.AI

멱법칙의 힘: 비대칭성은 구성적 추론을 가능하게 한다

The Power of Power Law: Asymmetry Enables Compositional Reasoning

Kaifeng Lyu

Citations: 95

h-index: 5

Zixuan Wang

Citations: 76

h-index: 2

Xingyu Dang

Citations: 101

h-index: 3

Jason D. Lee

Citations: 138

h-index: 5

자연어 데이터는 멱법칙 분포를 따르며, 대부분의 지식과 기술은 매우 낮은 빈도로 나타납니다. 일반적인 생각으로는 데이터 가중치 조정 또는 데이터 큐레이션을 통해 균일 분포에 가깝게 만들면 모델이 이러한 희소 기술을 더 잘 학습할 수 있다고 생각하지만, 저희는 반직관적인 결과를 발견했습니다. 상태 추적 및 다단계 산술 연산과 같은 다양한 구성적 추론 작업에서, 멱법칙 분포 하에서 훈련하는 것이 균일 분포 하에서 훈련하는 것보다 일관되게 더 좋은 성능을 보였습니다. 이러한 이점을 이해하기 위해, 저희는 최소한의 구성 기술 학습 작업을 도입하고, 멱법칙 분포 하에서 학습하는 것이 훨씬 적은 양의 훈련 데이터만으로도 증명적으로 가능함을 보여줍니다. 저희의 이론적 분석에 따르면, 멱법칙 샘플링은 유익한 비대칭성을 유발하여 손실 함수의 문제점을 개선하며, 모델이 낮은 데이터 복잡성을 가진 고빈도 기술 조합을 먼저 습득하도록 합니다. 이는 다시 희소한 꼬리 부분의 기술을 효율적으로 학습하는 데 도움이 됩니다. 저희의 결과는 모델 훈련을 위한 효과적인 데이터 분포에 대한 대안적인 관점을 제시합니다.

Original Abstract

Natural language data follows a power-law distribution, with most knowledge and skills appearing at very low frequency. While a common intuition suggests that reweighting or curating data towards a uniform distribution may help models better learn these long-tail skills, we find a counterintuitive result: across a wide range of compositional reasoning tasks, such as state tracking and multi-step arithmetic, training under power-law distributions consistently outperforms training under uniform distributions. To understand this advantage, we introduce a minimalist skill-composition task and show that learning under a power-law distribution provably requires significantly less training data. Our theoretical analysis reveals that power law sampling induces a beneficial asymmetry that improves the pathological loss landscape, which enables models to first acquire high-frequency skill compositions with low data complexity, which in turn serves as a stepping stone to efficiently learn rare long-tailed skills. Our results offer an alternative perspective on what constitutes an effective data distribution for training models.

0 Citations

0 Influential

2.5 Altmetric

12.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!