2605.05938v1 May 07, 2026 cs.AI

ICU-Bench: 다중 모드 대규모 언어 모델에서의 지속적인 지식 삭제 성능 평가

ICU-Bench:Benchmarking Continual Unlearning in Multimodal Large Language Models

Haichang Gao

Citations: 27

h-index: 2

Zhenxing Niu

Citations: 46

h-index: 3

Yuhang Wang

Citations: 27

h-index: 2

Guangyu He

Citations: 24

h-index: 2

Wenjie Mei

Citations: 11

h-index: 2

Junkai Zhang

Citations: 5

h-index: 1

다중 모드 대규모 언어 모델(MLLM)은 다양한 분야에서 괄목할 만한 발전을 이루었지만, 대규모 다중 모드 데이터 세트를 활용한 학습 과정은 심각한 개인 정보 보호 문제를 야기하며, 효과적인 머신 러닝 지식 삭제 기술의 필요성을 증가시키고 있습니다. 그러나 기존의 벤치마크는 주로 정적인 환경이나 짧은 시퀀스 환경에 초점을 맞추고 있어, 실제 환경에서의 지속적인 개인 정보 삭제 요청을 평가하는 데 제한적인 지원을 제공합니다. 이러한 격차를 해소하기 위해, 우리는 개인 정보 보호가 중요한 문서 데이터를 기반으로 구축된 지속적인 다중 모드 지식 삭제 벤치마크인 ICU-Bench를 소개합니다. ICU-Bench는 의료 보고서와 근로 계약이라는 두 가지 문서 도메인에서 추출한 1,000개의 개인 정보가 포함된 프로필, 9,500개의 이미지, 16,000개의 질문-답변 쌍, 그리고 100개의 삭제 작업을 포함합니다. 또한, 지속적인 지식 삭제 과정에서의 지식 삭제 효과, 과거 지식 보존, 유지 성능, 그리고 안정성을 종합적으로 분석할 수 있도록 새로운 지속적인 지식 삭제 평가 지표를 제시합니다. ICU-Bench를 활용한 대표적인 지식 삭제 방법들에 대한 광범위한 실험을 통해, 기존의 방법들이 지속적인 환경에서 어려움을 겪으며, 장기간의 작업 시퀀스에서 지식 삭제 품질, 유틸리티 유지, 그리고 확장성 간의 균형을 맞추는 데 명확한 한계를 보이는 것을 확인했습니다. 이러한 결과는 지속적인 개인 정보 삭제를 위해 특별히 설계된 다중 모드 지식 삭제 방법의 필요성을 강조합니다.

Original Abstract

Although Multimodal Large Language Models (MLLMs) have achieved remarkable progress across many domains, their training on large-scale multimodal datasets raises serious privacy concerns, making effective machine unlearning increasingly necessary. However, existing benchmarks mainly focus on static or short-sequence settings, offering limited support for evaluating continual privacy deletion requests in realistic deployments. To bridge this gap, we introduce ICU-Bench, a continual multimodal unlearning benchmark built on privacy-critical document data. ICU-Bench contains 1,000 privacy-sensitive profiles from two document domains, medical reports and labor contracts, with 9,500 images, 16,000 question-answer pairs, and 100 forget tasks. Additionally, new continual unlearning metrics are introduced, facilitating a comprehensive analysis of forgetting effectiveness, historical forgetting preservation, retained utility, and stability throughout the continual unlearning process. Through extensive experiments with representative unlearning methods on ICU-Bench, we show that existing methods generally struggle in continual settings and exhibit clear limitations in balancing forgetting quality, utility preservation, and scalability over long task sequences. These findings highlight the need for multimodal unlearning methods explicitly designed for continual privacy deletion.

0 Citations

0 Influential

1.5 Altmetric

7.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!