2604.03679v1 Apr 04, 2026 cs.CL

LightThinker++: 추론 압축에서 메모리 관리로

LightThinker++: From Reasoning Compression to Memory Management

Shuofei Qiao

Citations: 1,260

h-index: 12

Ningyu Zhang

Citations: 215

h-index: 7

Jintian Zhang

Citations: 736

h-index: 10

Yujie Luo

Citations: 214

h-index: 6

Lei Liang

Citations: 241

h-index: 7

Yuqi Zhu

Citations: 574

h-index: 7

Zhen-Hua Wan

Citations: 30

h-index: 2

Zhengke Gui

Citations: 111

h-index: 3

Da Zheng

Citations: 123

h-index: 5

Huajun Chen

Citations: 172

h-index: 5

대규모 언어 모델(LLM)은 복잡한 추론에 뛰어난 성능을 보이지만, 긴 추론 과정에서 발생하는 인지적 부담으로 인해 효율성이 제한됩니다. 본 논문에서는 LLM이 중간 추론 단계를 간결한 의미 표현으로 동적으로 압축할 수 있도록 하는 방법인 LightThinker를 제안합니다. 그러나 정적인 압축은 종종 복잡한 추론에서 문제를 일으키는데, 중간 세부 사항의 되돌릴 수 없는 손실은 논리적 병목 현상을 초래할 수 있습니다. 이를 해결하기 위해, 우리는 명시적인 적응형 메모리 관리를 도입하여 LightThinker++ 프레임워크를 발전시켰습니다. 이 새로운 패러다임은 명시적인 메모리 기능을 통합하여 행동 수준에서의 관리를 가능하게 하며, 의도적인 메모리 스케줄링을 위한 특수 트레이저리 합성 파이프라인을 지원합니다. 광범위한 실험 결과는 프레임워크의 다양한 활용 가능성을 보여줍니다. (1) LightThinker는 정확도 손실을 최소화하면서 토큰 사용량을 최대 70% 줄이고 추론 시간을 26% 단축합니다. (2) 표준 추론에서 LightThinker++는 최대 토큰 사용량을 69.9% 줄이는 동시에 동일한 컨텍스트 예산 내에서 정확도를 2.42% 향상시켜 최적의 성능을 달성합니다. (3) 특히, 장기적인 에이전트 기반 작업에서 80라운드 이상 안정적인 성능을 유지하며 (토큰 사용량을 60~70% 감소), 다양한 복잡한 시나리오에서 평균 14.8%의 성능 향상을 달성합니다. 전반적으로, 본 연구는 최소한의 오버헤드로 확장된 기간 동안 LLM의 심층적인 추론을 유지할 수 있는 확장 가능한 방법을 제시합니다.

Original Abstract

Large language models (LLMs) excel at complex reasoning, yet their efficiency is limited by the surging cognitive overhead of long thought traces. In this paper, we propose LightThinker, a method that enables LLMs to dynamically compress intermediate thoughts into compact semantic representations. However, static compression often struggles with complex reasoning where the irreversible loss of intermediate details can lead to logical bottlenecks. To address this, we evolve the framework into LightThinker++, introducing Explicit Adaptive Memory Management. This paradigm shifts to behavioral-level management by incorporating explicit memory primitives, supported by a specialized trajectory synthesis pipeline to train purposeful memory scheduling. Extensive experiments demonstrate the framework's versatility across three dimensions. (1) LightThinker reduces peak token usage by 70% and inference time by 26% with minimal accuracy loss. (2) In standard reasoning, LightThinker++ slashes peak token usage by 69.9% while yielding a +2.42% accuracy gain under the same context budget for maximum performance. (3) Most notably, in long-horizon agentic tasks, it maintains a stable footprint beyond 80 rounds (a 60%-70% reduction), achieving an average performance gain of 14.8% across different complex scenarios. Overall, our work provides a scalable direction for sustaining deep LLM reasoning over extended horizons with minimal overhead.

0 Citations

0 Influential

6 Altmetric

30.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!