2603.01792v1 Mar 02, 2026 cs.CL

ALTER: 토큰 엔트로피 기반 비대칭 LoRA를 활용한 LLM의 효과적인 지식 삭제 방법

ALTER: Asymmetric LoRA for Token-Entropy-Guided Unlearning of LLMs

Xun Chen

Citations: 418

h-index: 4

Jie Zou

Citations: 111

h-index: 7

Jiwei Wei

Citations: 776

h-index: 13

Wenhong Tian

Citations: 5

h-index: 2

Jinyu Guo

Citations: 31

h-index: 4

Yuang Li

Citations: 102

h-index: 3

Zhaokun Wang

Citations: 22

h-index: 3

Yifan Gong

Citations: 10

h-index: 2

대규모 언어 모델(LLM)은 다양한 분야에 걸쳐 광범위한 지식을 포함하도록 발전해 왔습니다. 그러나 LLM이 학습하지 않아야 할 내용을 통제하는 것은 안전한 사용을 보장하는 데 중요합니다. 그러나 LLM에서의 효과적인 지식 삭제는 지식 유지와 망각 사이의 모호한 경계로 인해 어렵습니다. 특히, 지속적인 다중 도메인 학습으로 인해 파라미터 공간이 복잡하게 얽혀 있어, 공격적인 지식 삭제 전략을 사용할 경우 예상치 못한 부작용이 발생할 수 있습니다. 또한, 최첨단(SOTA) 모델의 수십억 개의 파라미터를 최적화하는 데 필요한 계산 비용은 또 다른 장벽입니다. 본 연구에서는 지식 얽힘 및 지식 삭제 효율성이라는 두 가지 과제를 해결하기 위한 경량화된 LLM 지식 삭제 프레임워크인 ALTER를 제안합니다. ALTER는 두 단계로 작동합니다. (I) 높은 엔트로피를 가진 토큰을 캡처하고 LoRA의 공유 A 행렬을 통해 학습합니다. (II) 비대칭 LoRA 아키텍처를 사용하여 파라미터 격리 및 대상 하위 도메인 내의 토큰 삭제를 통해 특정 망각 목표를 달성합니다. 이는 비대칭 프레임워크에서 토큰 수준의 격리를 통한 지식 삭제를 달성하는 새로운 연구 방향을 제시합니다. ALTER는 TOFU, WMDP, MUSE 벤치마크에서 95% 이상의 지식 삭제율을 달성했으며, 핵심 토큰을 보존하여 최소한의 부작용을 보였습니다. 본 프레임워크는 LLM의 수십억 개의 파라미터로부터 지식 삭제를 분리하여 탁월한 효율성을 제공하며, 90% 이상의 모델 유용성을 유지하여 기준 성능인 47.8-83.6%를 훨씬 능가합니다.

Original Abstract

Large language models (LLMs) have advanced to encompass extensive knowledge across diverse domains. Yet controlling what a LLMs should not know is important for ensuring alignment and thus safe use. However, effective unlearning in LLMs is difficult due to the fuzzy boundary between knowledge retention and forgetting. This challenge is exacerbated by entangled parameter spaces from continuous multi-domain training, often resulting in collateral damage, especially under aggressive unlearning strategies. Furthermore, the computational overhead required to optimize State-of-the-Art (SOTA) models with billions of parameters poses an additional barrier. In this work, we present ALTER, a lightweight unlearning framework for LLMs to address both the challenges of knowledge entanglement and unlearning efficiency. ALTER operates through two phases: (I) high entropy tokens are captured and learned via the shared A matrix in LoRA, followed by (II) an asymmetric LoRA architecture that achieves a specified forgetting objective by parameter isolation and unlearning tokens within the target subdomains. Serving as a new research direction for achieving unlearning via token-level isolation in the asymmetric framework. ALTER achieves SOTA performance on TOFU, WMDP, and MUSE benchmarks with over 95% forget quality and shows minimal side effects through preserving foundational tokens. By decoupling unlearning from LLMs' billion-scale parameters, this framework delivers excellent efficiency while preserving over 90% of model utility, exceeding baseline preservation rates of 47.8-83.6%.

2 Citations

0 Influential

6.5 Altmetric

34.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!