2601.22690v1 Jan 30, 2026 cs.LG

트랜스포머 모델은 주기성 일반화 능력을 가지고 있는가?

Do Transformers Have the Ability for Periodicity Generalization?

Hao Zhu

Citations: 179

h-index: 6

Kechi Zhang

Peking University

Citations: 1,011

h-index: 13

Yihong Dong

Peking University

Citations: 1,984

h-index: 20

Huanyu Liu

Citations: 378

h-index: 8

Ge Li

Citations: 193

h-index: 7

Sihan Wu

Citations: 18

h-index: 2

Peixu Wang

Citations: 2

h-index: 1

Sihao Cheng

Citations: 11

h-index: 1

Taozhi Chen

Tsinghua University

Citations: 5

h-index: 1

Tong Liu

Citations: 253

h-index: 6

트랜스포머 기반의 대규모 언어 모델(LLM)은 다양한 작업에서 뛰어난 성능을 보여주었습니다. 그러나 현재 모델은 여전히 인간과 비교했을 때, 일반화 성능, 특히 데이터 분포가 다른(out-of-distribution, OOD) 상황에서의 일반화 능력에 상당한 한계를 가지고 있습니다. 본 연구에서는 이러한 격차를 주기성이라는 기본적인 OOD 시나리오를 통해 분석합니다. 주기성은 변화 속에서의 불변성을 나타내며, 주기성 일반화는 모델이 학습 데이터로부터 주기적인 패턴을 추출하고, 이를 OOD 상황에 일반화하는 능력을 의미합니다. 본 연구에서는 추상 대수학과 추론의 관점에서 주기성을 통일적으로 해석하고, 단일 및 복합 주기성을 모두 포함하여 트랜스포머 모델이 주기성을 일반화하는 데 어려움을 겪는 이유를 설명합니다. 또한, 복합 주기성에 대한 Coper라는 제어 가능한 생성 벤치마크를 구축하고, Hollow와 Extrapolation이라는 두 가지 OOD 설정을 포함합니다. 실험 결과, 트랜스포머 모델의 주기성 일반화 능력은 제한적이며, 모델은 학습 과정에서 주기적인 데이터를 암기할 수 있지만, 새로운 복합 주기성에 대해서는 일반화하지 못한다는 것을 확인했습니다. 본 연구에서 사용된 소스 코드는 향후 연구를 지원하기 위해 공개합니다.

Original Abstract

Large language models (LLMs) based on the Transformer have demonstrated strong performance across diverse tasks. However, current models still exhibit substantial limitations in out-of-distribution (OOD) generalization compared with humans. We investigate this gap through periodicity, one of the basic OOD scenarios. Periodicity captures invariance amid variation. Periodicity generalization represents a model's ability to extract periodic patterns from training data and generalize to OOD scenarios. We introduce a unified interpretation of periodicity from the perspective of abstract algebra and reasoning, including both single and composite periodicity, to explain why Transformers struggle to generalize periodicity. Then we construct Coper about composite periodicity, a controllable generative benchmark with two OOD settings, Hollow and Extrapolation. Experiments reveal that periodicity generalization in Transformers is limited, where models can memorize periodic data during training, but cannot generalize to unseen composite periodicity. We release the source code to support future research.

0 Citations

0 Influential

10 Altmetric

50.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!