2601.17443v1 Jan 24, 2026 cs.CL

클러스터링 기반 메모리 압축을 통한 온디바이스 대규모 언어 모델

Clustering-driven Memory Compression for On-device Large Language Models

Pramit Saha

Citations: 111

h-index: 6

Ondrej Bohdal

Citations: 523

h-index: 11

Umberto Michieli

Citations: 1,657

h-index: 19

Mete Ozay

Citations: 234

h-index: 8

T. Ceritli

Citations: 141

h-index: 6

대규모 언어 모델(LLM)은 종종 사용자 맞춤형 응답을 생성하기 위해 과거 상호작용에서 추출된 사용자별 메모리에 의존합니다. 일반적인 방법은 이러한 메모리를 입력 프롬프트와 연결하는 것이지만, 이 방법은 온디바이스 LLM에서 제한된 컨텍스트 용량을 빠르게 소모하게 됩니다. 메모리를 평균화하여 압축하면 컨텍스트 증가를 줄일 수 있지만, 이로 인해 서로 다른 메모리 간의 의미적 충돌로 인해 성능이 저하되는 경우가 많습니다. 본 연구에서는 컨텍스트 효율성과 개인화 품질의 균형을 맞추는 클러스터링 기반 메모리 압축 전략을 제안합니다. 우리의 방법은 메모리를 유사성에 따라 그룹화하고 연결 전에 클러스터 내에서 병합하여 일관성을 유지하면서 중복을 줄입니다. 실험 결과, 제안하는 방법은 메모리 토큰 수를 크게 줄이는 동시에 단순 평균화 또는 직접 연결과 같은 기존 방법보다 우수한 성능을 보입니다. 또한, 고정된 컨텍스트 예산을 기준으로, 클러스터링 기반 병합은 더 압축된 메모리 표현을 제공하며 응답 생성 품질을 지속적으로 향상시킵니다.

Original Abstract

Large language models (LLMs) often rely on user-specific memories distilled from past interactions to enable personalized generation. A common practice is to concatenate these memories with the input prompt, but this approach quickly exhausts the limited context available in on-device LLMs. Compressing memories by averaging can mitigate context growth, yet it frequently harms performance due to semantic conflicts across heterogeneous memories. In this work, we introduce a clustering-based memory compression strategy that balances context efficiency and personalization quality. Our method groups memories by similarity and merges them within clusters prior to concatenation, thereby preserving coherence while reducing redundancy. Experiments demonstrate that our approach substantially lowers the number of memory tokens while outperforming baseline strategies such as naive averaging or direct concatenation. Furthermore, for a fixed context budget, clustering-driven merging yields more compact memory representations and consistently enhances generation quality.

0 Citations

0 Influential

9.5 Altmetric

47.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!