2602.03249v1 Feb 03, 2026 cs.AI

아코디언 씽킹: 효율적이고 가독성 높은 LLM 추론을 위한 자율 조절 단계 요약

Accordion-Thinking: Self-Regulated Step Summaries for Efficient and Readable LLM Reasoning

Zhicheng YANG

Sun Yat-Sen University

Citations: 263

h-index: 9

Yongxin Wang

Citations: 103

h-index: 4

Yiwei Wang

Citations: 116

h-index: 5

Zhijiang Guo

HKUST (GZ)

Citations: 4,271

h-index: 30

Yinya Huang

Citations: 156

h-index: 6

Wenlei Shi

Citations: 134

h-index: 5

Xiaodan Liang

Citations: 567

h-index: 10

Jing Tang

Citations: 169

h-index: 5

긴 사고 사슬(Chain-of-Thought)을 통해 테스트 시간 연산량을 확장하면 추론 능력이 비약적으로 향상되지만, KV 캐시의 선형적 증가와 어텐션(attention) 복잡도의 이차적 증가로 인해 실질적인 한계에 부딪힌다. 본 논문에서는 동적 요약을 통해 LLM이 추론 단계의 세밀도를 스스로 조절하도록 학습하는 엔드투엔드 프레임워크인 '아코디언 씽킹(Accordion-Thinking)'을 소개한다. 이 메커니즘은 모델이 주기적으로 사고 과정을 요약하고 이전 내용을 삭제함으로써 과거 토큰에 대한 의존도를 줄이는 '폴드(Fold)' 추론 모드를 가능하게 한다. 우리는 이 기능을 더욱 촉진하기 위해 강화 학습을 적용하였으며, 이를 통해 매우 효율적인 폴드 모드와 모든 정보를 유지하는 언폴드(Unfold) 모드 사이의 정확도 격차가 훈련 과정에서 점차 줄어들다가 결국 사라진다는 중요한 사실을 발견했다. 이는 모델이 핵심 추론 정보를 간결한 요약으로 인코딩하는 법을 학습하여 추론 컨텍스트를 효과적으로 압축할 수 있음을 보여준다. 우리의 아코디언 씽커(Accordion-Thinker)는 학습된 자가 압축 기능을 통해, 결과의 품질 저하 없이 최소한의 토큰 오버헤드로 복잡한 추론 작업을 해결할 수 있음을 입증했다. 또한 구조화된 단계별 요약이 추론 과정에 대해 사람이 이해하기 쉬운 설명을 제공하는 동시에, 48GB GPU 메모리 환경에서 정확도를 유지하면서 3배의 처리량을 달성한다.

Original Abstract

Scaling test-time compute via long Chain-ofThought unlocks remarkable gains in reasoning capabilities, yet it faces practical limits due to the linear growth of KV cache and quadratic attention complexity. In this paper, we introduce Accordion-Thinking, an end-to-end framework where LLMs learn to self-regulate the granularity of the reasoning steps through dynamic summarization. This mechanism enables a Fold inference mode, where the model periodically summarizes its thought process and discards former thoughts to reduce dependency on historical tokens. We apply reinforcement learning to incentivize this capability further, uncovering a critical insight: the accuracy gap between the highly efficient Fold mode and the exhaustive Unfold mode progressively narrows and eventually vanishes over the course of training. This phenomenon demonstrates that the model learns to encode essential reasoning information into compact summaries, achieving effective compression of the reasoning context. Our Accordion-Thinker demonstrates that with learned self-compression, LLMs can tackle complex reasoning tasks with minimal dependency token overhead without compromising solution quality, and it achieves a 3x throughput while maintaining accuracy on a 48GB GPU memory configuration, while the structured step summaries provide a human-readable account of the reasoning process.

2 Citations

0 Influential

15 Altmetric

77.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!