2601.20861v1 Jan 28, 2026 cs.LG

진화 전략은 LLM에서 파국적인 망각 현상을 초래한다

Evolutionary Strategies lead to Catastrophic Forgetting in LLMs

Akshat Gupta

Citations: 22

h-index: 3

G. Anumanchipalli

Citations: 3,783

h-index: 27

Nicholas Lee

Citations: 27

h-index: 2

Immanuel Abdi

Citations: 2

h-index: 1

Micah Mok

Citations: 1

h-index: 1

Alexander Lu

Citations: 2

h-index: 1

현재 AI 시스템의 가장 큰 한계 중 하나는 배포 후에도 지속적으로 학습할 수 있는 능력의 부족입니다. 이러한 지속적인 학습 시스템을 구현하는 데에는 여러 가지 어려움이 있는데, 그중 하나는 최첨단 LLM을 훈련하는 데 사용되는 경사 기반 알고리즘의 큰 메모리 요구 사항입니다. 최근 진화 전략(ES)은 기존 학습 알고리즘의 경사 기반 대안으로 다시 등장했으며, LLM의 특정 작업에서 유망한 성능을 보여주었습니다. 본 논문에서는 ES에 대한 종합적인 분석을 수행하고, 특히 업데이트 단계를 늘려 훈련할 때 발생하는 ES의 망각 곡선을 평가합니다. 먼저, ES는 GRPO와 비교할 만한 계산 예산으로 수학 및 추론 작업에서 GRPO에 가까운 성능을 달성할 수 있음을 확인했습니다. 그러나, 지속적인 학습에 있어 가장 중요한 점은, ES의 성능 향상은 이전 능력에 대한 상당한 망각을 동반하며, 이는 모델을 온라인으로 훈련하는 데 ES의 적용 가능성을 제한합니다. 또한, 이러한 현상의 원인을 탐구하고, ES를 사용한 업데이트가 GRPO 업데이트보다 훨씬 덜 희소하며, $\ell_2$ 노름이 훨씬 크다는 것을 보여주어, 두 알고리즘 간의 상반된 망각 곡선을 설명합니다. 본 연구를 통해 ES와 같은 경사 기반이 아닌 알고리즘에서 발생하는 망각 문제를 강조하고, 이러한 문제를 완화하기 위한 미래 연구에 영감을 주기를 희망합니다.

Original Abstract

One of the biggest missing capabilities in current AI systems is the ability to learn continuously after deployment. Implementing such continually learning systems have several challenges, one of which is the large memory requirement of gradient-based algorithms that are used to train state-of-the-art LLMs. Evolutionary Strategies (ES) have recently re-emerged as a gradient-free alternative to traditional learning algorithms and have shown encouraging performance on specific tasks in LLMs. In this paper, we perform a comprehensive analysis of ES and specifically evaluate its forgetting curves when training for an increasing number of update steps. We first find that ES is able to reach performance numbers close to GRPO for math and reasoning tasks with a comparable compute budget. However, and most importantly for continual learning, the performance gains in ES is accompanied by significant forgetting of prior abilities, limiting its applicability for training models online. We also explore the reason behind this behavior and show that the updates made using ES are much less sparse and have orders of magnitude larger $\ell_2$ norm compared to corresponding GRPO updates, explaining the contrasting forgetting curves between the two algorithms. With this study, we aim to highlight the issue of forgetting in gradient-free algorithms like ES and hope to inspire future work to mitigate these issues.

1 Citations

0 Influential

13.5 Altmetric

68.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!