2602.02366v1 Feb 02, 2026 cs.LG

ReasonCACHE: 가중치 업데이트 없이 LLM에 추론 능력을 학습시키는 방법

ReasonCACHE: Teaching LLMs To Reason Without Weight Updates

Phillip Isola

Citations: 3

h-index: 1

Stefanie Jegelka

Citations: 217

h-index: 6

Sharut Gupta

Citations: 73

h-index: 7

David Lopez-Paz

Citations: 27,487

h-index: 34

Kartik Ahuja

Citations: 85

h-index: 6

Mark Ibrahim

Citations: 165

h-index: 5

Mohammad Pezeshki

Citations: 183

h-index: 7

대규모 언어 모델(LLM)이 가중치 업데이트 없이, 단지 문맥 학습(ICL)만을 통해 추론 능력을 학습할 수 있을까요? ICL은 매우 효율적인 학습 방식으로, 종종 몇 가지 예시만으로도 학습이 가능하지만, 복잡한 추론 작업은 일반적으로 학습에 많은 훈련 예시가 필요합니다. 그러나 ICL을 단순히 더 많은 예시를 추가하여 확장하는 방식은 이러한 규모에서는 효과가 떨어집니다. 어텐션 비용이 제곱으로 증가하고, 더 긴 문맥에서 성능이 포화되거나 저하되며, 이 방법은 여전히 표면적인 학습 방식에 머무릅니다. 이러한 한계로 인해, 실무자들은 주로 추론 능력을 유도하기 위해 가중치 기반 학습(IWL)에 의존합니다. 본 연구에서는 Prefix Tuning을 사용하여 LLM이 문맥 창을 과도하게 사용하지 않고, 어떠한 가중치 업데이트 없이도 추론 능력을 학습할 수 있음을 보여줍니다. 우리는 이 메커니즘의 한 예시인 $ extbf{ReasonCACHE}$를 소개합니다. ReasonCACHE는 데모 예시를 고정된 키-값 캐시에 저장하는 방식입니다. 실험 결과, GPQA-Diamond를 포함한 다양한 추론 벤치마크에서 ReasonCACHE는 표준 ICL보다 성능이 뛰어나며, IWL 방식과 동등하거나 그 이상의 성능을 보입니다. 또한, ReasonCACHE는 데이터, 추론 비용, 그리고 학습 가능한 파라미터 측면에서 더 효율적입니다. 또한, 이론적으로 ReasonCACHE는 저랭크 가중치 업데이트보다 더 표현력이 뛰어나다는 것을 증명합니다. 왜냐하면 저랭크 가중치 업데이트는 표현력을 입력 랭크에 종속시키는 반면, ReasonCACHE는 키-값을 직접 어텐션 메커니즘에 주입하여 이러한 제약을 회피하기 때문입니다. 종합적으로, 우리의 연구 결과는 ReasonCACHE가 문맥 학습과 가중치 기반 학습 사이의 중간 지점에 위치하며, 파라미터를 변경하지 않고 문맥 창을 넘어서는 추론 능력을 학습할 수 있는 확장 가능한 알고리즘을 제공한다는 것을 보여줍니다. 프로젝트 페이지: https://reasoncache.github.io/

Original Abstract

Can Large language models (LLMs) learn to reason without any weight update and only through in-context learning (ICL)? ICL is strikingly sample-efficient, often learning from only a handful of demonstrations, but complex reasoning tasks typically demand many training examples to learn from. However, naively scaling ICL by adding more demonstrations breaks down at this scale: attention costs grow quadratically, performance saturates or degrades with longer contexts, and the approach remains a shallow form of learning. Due to these limitations, practitioners predominantly rely on in-weight learning (IWL) to induce reasoning. In this work, we show that by using Prefix Tuning, LLMs can learn to reason without overloading the context window and without any weight updates. We introduce $\textbf{ReasonCACHE}$, an instantiation of this mechanism that distills demonstrations into a fixed key-value cache. Empirically, across challenging reasoning benchmarks, including GPQA-Diamond, ReasonCACHE outperforms standard ICL and matches or surpasses IWL approaches. Further, it achieves this all while being more efficient across three key axes: data, inference cost, and trainable parameters. We also theoretically prove that ReasonCACHE can be strictly more expressive than low-rank weight update since the latter ties expressivity to input rank, whereas ReasonCACHE bypasses this constraint by directly injecting key-values into the attention mechanism. Together, our findings identify ReasonCACHE as a middle path between in-context and in-weight learning, providing a scalable algorithm for learning reasoning skills beyond the context window without modifying parameters. Our project page: https://reasoncache.github.io/

1 Citations

0 Influential

17 Altmetric

86.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!