2602.08030v2 Feb 08, 2026 cs.AI

Free(): Malloc 전용 추론 모델에서의 망각 학습

Free(): Learning to Forget in Malloc-Only Reasoning Models

Yi Zheng

Citations: 544

h-index: 2

Haitao Mi

Citations: 12

h-index: 2

Dongyang Ma

Citations: 46

h-index: 4

Yan Wang

Citations: 10

h-index: 2

Tian Liang

Citations: 76

h-index: 3

Jiahao Xu

Citations: 1,160

h-index: 11

Xin-Zhong Huang

Citations: 3

h-index: 1

Lijie Chen

Citations: 115

h-index: 5

추론 모델은 테스트 시간 연산(test-time compute)을 확장하여 문제 해결 능력을 향상시키지만, 치명적인 역설에 직면해 있습니다. 즉, 과도한 '생각 토큰(thinking tokens)'은 성능을 개선하기보다는 오히려 저하시키는 경우가 많습니다. 우리는 이를 근본적인 아키텍처 결함 때문이라고 봅니다. 표준 LLM은 'malloc 전용(malloc-only)' 엔진처럼 작동하여, 더 이상 필요 없는 정보를 제거하는 메커니즘 없이 유효한 단계와 중복된 단계를 모두 지속적으로 축적합니다. 이러한 악순환을 끊기 위해, 우리는 플러그 앤 플레이 LoRA 어댑터인 Free-Module을 통해 내재적인 자가 망각(self-forgetting) 기능을 도입한 모델, Free()LM을 제안합니다. Free()LM은 추론 모드와 정리 모드를 반복적으로 전환하며 쓸모없는 컨텍스트 청크를 동적으로 식별하고 제거하여, 간결하고 노이즈 없는 상태를 유지합니다. 광범위한 실험 결과, Free()LM은 모든 모델 규모(8B~685B)에서 일관된 성능 향상을 보였습니다. 최상위 추론 베이스라인 대비 평균 3.3%의 향상을 달성했으며, 심지어 DeepSeek V3.2-Speciale을 사용하여 IMOanswerBench에서 새로운 SOTA를 기록했습니다. 가장 주목할 만한 점은 표준 Qwen3-235B-A22B 모델이 완전히 실패(정확도 0%)하는 장기(long-horizon) 과제에서 Free()LM이 성능을 50%까지 회복시켰다는 것입니다. 우리의 연구 결과는 지속 가능한 지능을 위해서는 생각하는 힘만큼이나 망각할 수 있는 자유도 필요하다는 점을 시사합니다.

Original Abstract

Reasoning models enhance problem-solving by scaling test-time compute, yet they face a critical paradox: excessive thinking tokens often degrade performance rather than improve it. We attribute this to a fundamental architectural flaw: standard LLMs operate as "malloc-only" engines, continuously accumulating valid and redundant steps alike without a mechanism to prune obsolete information. To break this cycle, we propose Free()LM, a model that introduces an intrinsic self-forgetting capability via the Free-Module, a plug-and-play LoRA adapter. By iteratively switching between reasoning and cleaning modes, Free()LM dynamically identifies and prunes useless context chunks, maintaining a compact and noise-free state. Extensive experiments show that Free()LM provides consistent improvements across all model scales (8B to 685B). It achieves a 3.3% average improvement over top-tier reasoning baselines, even establishing a new SOTA on IMOanswerBench using DeepSeek V3.2-Speciale. Most notably, in long-horizon tasks where the standard Qwen3-235B-A22B model suffers a total collapse (0% accuracy), Free()LM restores performance to 50%. Our findings suggest that sustainable intelligence requires the freedom to forget as much as the power to think.

1 Citations

0 Influential

5.5 Altmetric

28.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!