2602.05970v1 Feb 05, 2026 cs.LG

대부분의 레이어가 유사할 때의 역방향 깊이 스케일링

Inverse Depth Scaling From Most Layers Being Similar

Ziming Liu

Citations: 92

h-index: 5

Yizhou Liu

Citations: 159

h-index: 8

Sara Kangaslahti

Citations: 0

h-index: 0

Jeff Gore

Citations: 53

h-index: 4

신경망 스케일링 법칙은 대규모 언어 모델(LLM)에서 모델 크기와 손실 간의 관계를 설명하지만, 깊이와 너비는 성능에 서로 다른 영향을 미칠 수 있으므로, 이에 대한 더 자세한 연구가 필요합니다. 본 연구에서는 LLM과 간단한 잔차 신경망을 분석하여 깊이가 손실에 미치는 영향을 정량적으로 평가했습니다. 연구 결과, LLM에서 손실은 깊이에 반비례하여 감소하는 경향을 보이며, 이는 기능적으로 유사한 레이어들이 앙상블 평균을 통해 오류를 줄이기 때문일 가능성이 높습니다. 이는 효율성이 떨어지는 방식이지만, 잔차 신경망의 구조적 편향과 매끄러운 동역학에 적합하지 않은 목표 함수로 인해 발생할 수 있습니다. 이러한 결과는 LLM의 효율성을 향상시키기 위해서는 깊이를 활용하는 방식에 대한 구조적인 혁신이 필요하다는 것을 시사합니다.

Original Abstract

Neural scaling laws relate loss to model size in large language models (LLMs), yet depth and width may contribute to performance differently, requiring more detailed studies. Here, we quantify how depth affects loss via analysis of LLMs and toy residual networks. We find loss scales inversely proportional to depth in LLMs, probably due to functionally similar layers reducing error through ensemble averaging rather than compositional learning or discretizing smooth dynamics. This regime is inefficient yet robust and may arise from the architectural bias of residual networks and target functions incompatible with smooth dynamics. The findings suggest that improving LLM efficiency may require architectural innovations to encourage compositional use of depth.

0 Citations

0 Influential

4 Altmetric

20.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!