2606.11634v1 Jun 10, 2026 cs.AI

Architecture-Aware Reinforcement Learning Makes Sliding-Window Attention Competitive in Math Reasoning

Xinchen Xie
Xinchen Xie
Citations: 44
h-index: 2
Qipeng Guo
Qipeng Guo
Citations: 1,534
h-index: 10
Xiaowen Chu
Xiaowen Chu
Citations: 802
h-index: 17
Kaibo Liu
Kaibo Liu
Citations: 223
h-index: 3
Peijie Dong
Peijie Dong
Citations: 486
h-index: 14
Jianfei Gao
Jianfei Gao
Citations: 437
h-index: 5
Shaoting Zhang
Shaoting Zhang
Citations: 90
h-index: 3
Kai Chen
Kai Chen
Citations: 18
h-index: 3

The rapid progress of reasoning and agentic large language models (LLMs) has increased the demand for long-context inference, but self-attention (SA) scales quadratically with context length. To address this, we study SWARR (Sliding-Window Attention with Reinforced Adaptation for Math Reasoning), a practical recipe for adapting SWA models to mathematical reasoning. SWARR has two stages: (1) efficient conversion from a pretrained SA model to SWA with supervised fine-tuning (SFT), which avoids pretraining a new base model, and (2) policy adaptation with reinforcement learning (RL). We find that SWA still underperforms SA after SFT, and we hypothesize that this gap is caused in part by a data-architecture mismatch: most SFT data are prepared for SA models and may contain long-range dependencies that are difficult for SWA to model. Because on-policy RL optimizes self-generated trajectories under the SWA constraint, it can adapt trajectories to better match SWA. Experiments on mathematical reasoning benchmarks show that this recipe substantially narrows the gap between SWA and SA, recovering much of the accuracy lost during SWA conversion while preserving the efficiency benefits of linear-complexity attention. Our central contribution is the empirical finding that RL changes the conclusion one would draw from conversion and SFT alone about SWA's viability for math reasoning.

0 Citations
0 Influential
8.5 Altmetric
42.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!