2602.01439v1 Feb 01, 2026 cs.LG

TQL: 트랜스포머를 활용한 Q 함수 확장 방법: 어텐션 붕괴 방지를 통해

TQL: Scaling Q-Functions with Transformers by Preventing Attention Collapse

Perry Dong

Citations: 113

h-index: 5

Kuo-Han Hung

Citations: 105

h-index: 3

Alex Swerdlow

Citations: 110

h-index: 2

D. Sadigh

Citations: 36

h-index: 3

Chelsea Finn

Citations: 77

h-index: 5

최근 머신러닝 분야에서 규모 확장이 상당한 발전을 가져왔지만, 강화학습(RL) 방법은 여전히 작은 값 함수를 주로 사용합니다. 트랜스포머 아키텍처는 확장성이 뛰어나다는 것이 알려져 있지만, 값 함수를 단순히 확장하는 방식은 종종 학습 불안정성을 초래하고 성능 저하를 야기합니다. 본 연구에서는 트랜스포머가 값 함수를 효과적으로 확장하는 데 어떤 문제가 있는지 탐구합니다. 경험적 분석을 통해, 우리는 확장 과정에서 발생하는 주요 문제점을 발견했습니다. 바로, 용량이 증가함에 따라 어텐션 스코어가 붕괴되는 현상입니다. 핵심적인 통찰력은 어텐션 스코어의 엔트로피를 제어함으로써 이러한 붕괴를 효과적으로 방지하고 학습을 안정화시킬 수 있다는 것입니다. 이를 위해, 본 연구에서는 트랜스포머 Q 학습(TQL)이라는 방법을 제안합니다. TQL은 강화학습에서 값 함수 학습에 트랜스포머의 확장 잠재력을 활용할 수 있도록 합니다. 제안하는 방법은 가장 작은 네트워크에서 가장 큰 네트워크로 확장할 때 최대 43%의 성능 향상을 보여주며, 기존 방법은 성능 저하를 겪는 반면, 본 연구는 성능 향상을 달성합니다.

Original Abstract

Despite scale driving substantial recent advancements in machine learning, reinforcement learning (RL) methods still primarily use small value functions. Naively scaling value functions -- including with a transformer architecture, which is known to be highly scalable -- often results in learning instability and worse performance. In this work, we ask what prevents transformers from scaling effectively for value functions? Through empirical analysis, we identify the critical failure mode in this scaling: attention scores collapse as capacity increases. Our key insight is that we can effectively prevent this collapse and stabilize training by controlling the entropy of the attention scores, thereby enabling the use of larger models. To this end, we propose Transformer Q-Learning (TQL), a method that unlocks the scaling potential of transformers in learning value functions in RL. Our approach yields up to a 43% improvement in performance when scaling from the smallest to the largest network sizes, while prior methods suffer from performance degradation.

3 Citations

0 Influential

2.5 Altmetric

15.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!