2604.10971v1 Apr 13, 2026 cs.CV

MMR-AD: 다중 모드 대규모 데이터셋: 다중 모드 대규모 언어 모델을 활용한 일반적인 이상 탐지 벤치마킹

MMR-AD: A Large-Scale Multimodal Dataset for Benchmarking General Anomaly Detection with Multimodal Large Language Models

Zefeng Qian

Citations: 101

h-index: 3

Chongyang Zhang

Citations: 125

h-index: 4

Xincheng Yao

Citations: 266

h-index: 5

Chao Shi

Citations: 22

h-index: 3

Jiayang Song

Citations: 636

h-index: 12

산업 분야의 이상 탐지에서, 일반적인 이상 탐지(GAD)는 떠오르는 트렌드이자 궁극적인 목표입니다. 기존의 단일/다중 클래스 이상 탐지와 달리, 일반적인 이상 탐지는 대상 데이터에 대한 재학습이나 미세 조정을 거치지 않고도 다양한 새로운 클래스에서 직접적으로 이상을 탐지할 수 있는 일반적인 이상 탐지 모델을 학습하는 것을 목표로 합니다. 최근, 다중 모드 대규모 언어 모델(MLLM)은 혁신적인 시각적 이해 및 언어 추론 능력을 바탕으로 일반적인 이상 탐지를 달성하는 데 큰 잠재력을 보여주었습니다. 그러나 MLLM의 일반적인 이상 탐지 능력은 다음과 같은 이유로 인해 아직 충분히 연구되지 않았습니다. (1) MLLM은 웹에서 수집된 방대한 데이터로 사전 훈련되지만, 이러한 데이터는 여전히 이상 탐지 시나리오의 데이터와 상당한 격차를 가지고 있습니다. 또한, 사전 훈련 과정에서 사용되는 이미지-텍스트 쌍은 특정 이상 탐지 작업을 위해 설계되지 않았습니다. (2) 현재 주류의 이상 탐지 데이터셋은 이미지 기반이며, 사전 훈련된 MLLM에 적합하지 않습니다. MLLM 기반의 일반적인 이상 탐지 연구를 촉진하기 위해, 우리는 학습 및 평가를 위한 종합적인 벤치마크인 MMR-AD를 제시합니다. MMR-AD를 통해, 현재 최고 성능의 일반적인 MLLM의 이상 탐지 성능이 여전히 산업적 요구 사항에 미치지 못한다는 것을 확인했습니다. 또한, MMR-AD를 기반으로, CoT 데이터로부터 학습하고 강화 학습을 통해 더욱 향상된 추론 기반의 이상 탐지 모델인 Anomaly-R1을 제안합니다. 광범위한 실험 결과, Anomaly-R1은 일반적인 MLLM에 비해 이상 탐지 및 이상 위치 파악 모두에서 괄목할 만한 성능 향상을 보였습니다.

Original Abstract

In the progress of industrial anomaly detection, general anomaly detection (GAD) is an emerging trend and also the ultimate goal. Unlike the conventional single- and multi-class AD, general AD aims to train a general AD model that can directly detect anomalies in diverse novel classes without any retraining or fine-tuning on the target data. Recently, Multimodal Large Language Models (MLLMs) have shown great promise in achieving general anomaly detection due to their revolutionary visual understanding and language reasoning capabilities. However, MLLM's general AD ability remains underexplored due to: (1) MLLMs are pretrained on amounts of data sourced from the Web, these data still have significant gaps with the data in AD scenarios. Moreover, the image-text pairs during pretraining are also not specifically for AD tasks. (2) The current mainstream AD datasets are image-based and not yet suitable for post-training MLLMs. To facilitate MLLM-based general AD research, we present MMR-AD, which is a comprehensive benchmark for both training and evaluating MLLM-based AD models. With MMR-AD, we reveal that the AD performance of current SOTA generalist MLLMs still falls far behind the industrial requirements. Based on MMR-AD, we also propose a baseline model, Anomaly-R1, which is a reasoning-based AD model that learns from the CoT data in MMR-AD and is further enhanced by reinforcement learning. Extensive experiments show that our Anomaly-R1 achieves remarkable improvements over generalist MLLMs in both anomaly detection and localization.

0 Citations

0 Influential

6 Altmetric

30.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!