2604.13777v1 Apr 15, 2026 cs.CL

앵커에서 감독 학습까지: 메모리 그래프 기반의 코퍼스 프리 언러닝 - 대규모 언어 모델을 위한 접근 방식

From Anchors to Supervision: Memory-Graph Guided Corpus-Free Unlearning for Large Language Models

Geng Hong

Citations: 364

h-index: 8

Wenxuan Li

Citations: 12

h-index: 2

Zhenfei Zhang

Citations: 18

h-index: 2

Mi Zhang

Citations: 1,154

h-index: 15

Mingxing Wen

Citations: 70

h-index: 2

X. You

Citations: 87

h-index: 5

Min Yang

Citations: 74

h-index: 6

대규모 언어 모델(LLM)은 민감하거나 저작권이 있는 콘텐츠를 기억할 수 있으며, 이는 심각한 개인 정보 보호 및 법적 문제를 야기합니다. 머신 언러닝은 이러한 문제를 해결할 수 있는 잠재적 방법으로 부상했지만, 기존의 방법은 사용자가 제공하는 삭제 데이터 세트에 의존하며, 이는 삭제 요청의 감사 가능성을 어렵게 만들고 시스템을 2차 유출 및 악의적인 공격에 노출시킬 수 있습니다. 본 논문에서는 사용자가 제공하는 데이터의 양을 최소화하고 코퍼스 없이 언러닝을 수행할 수 있는 MAGE(Memory-grAph Guided Erasure)라는 새로운 프레임워크를 제안합니다. MAGE는 대상 엔티티를 식별하는 가벼운 사용자 앵커만을 사용하여 대상 LLM에서 관련 정보를 추출하고, 이를 가중치가 적용된 로컬 메모리 그래프로 구성한 다음, 언러닝을 위한 특정 영역에 대한 감독 정보를 생성합니다. MAGE는 모델에 구애받지 않으며, 기존의 언러닝 방법에 쉽게 통합될 수 있으며, 원래 학습 코퍼스에 대한 접근 권한이 필요하지 않습니다. TOFU 및 RWKU라는 두 가지 벤치마크에서 수행한 실험 결과, MAGE가 자체적으로 생성한 감독 정보는 외부 참조를 사용하여 생성된 감독 정보와 유사한 효과적인 언러닝 성능을 달성하며, 전체적인 유용성을 유지함을 보여주었습니다. 이러한 결과는 최소한의 앵커를 기반으로 하는 실용적이고 감사 가능한 언러닝 워크플로우를 가능하게 합니다.

Original Abstract

Large language models (LLMs) may memorize sensitive or copyrighted content, raising significant privacy and legal concerns. While machine unlearning has emerged as a potential remedy, prevailing paradigms rely on user-provided forget sets, making unlearning requests difficult to audit and exposing systems to secondary leakage and malicious abuse. We propose MAGE, a Memory-grAph Guided Erasure framework for user-minimized, corpus-free unlearning. Given only a lightweight user anchor that identifies a target entity, MAGE probes the target LLM to recover target-related memorization, organizes it into a weighted local memory graph, and synthesizes scoped supervision for unlearning. MAGE is model-agnostic, can be plugged into standard unlearning methods, and requires no access to the original training corpus. Experiments on two benchmarks, TOFU and RWKU, demonstrate that MAGE's self-generated supervision achieves effective unlearning performance comparable to supervision generated with external reference, while preserving overall utility. These results support a practical and auditable unlearning workflow driven by minimal anchors rather than user-supplied forget corpora.

1 Citations

0 Influential

7.5 Altmetric

38.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!