2602.03263v1 Feb 03, 2026 cs.AI

CSR-Bench: MLLM의 교차 모달 안전성 및 신뢰성 평가를 위한 벤치마크

CSR-Bench: A Benchmark for Evaluating the Cross-modal Safety and Reliability of MLLMs

Kun Yang

Citations: 2

h-index: 1

Yuxuan Liu

Citations: 42

h-index: 3

Yuntian Shi

Citations: 1

h-index: 1

Kun Wang

Citations: 102

h-index: 4

Hao Shen

Citations: 28

h-index: 2

멀티모달 대규모 언어 모델(MLLM)은 텍스트와 이미지 모두를 통한 상호작용을 가능하게 하지만, 이들의 안전성 동작은 진정한 결합 의도 이해보다는 단일 모달의 지름길(shortcut)에 의해 좌우될 수 있습니다. 본 연구에서는 안전성(Safety), 과도한 거부(Over-rejection), 편향(Bias), 환각(Hallucination)에 걸친 네 가지 스트레스 테스트 상호작용 패턴을 통해 교차 모달 신뢰성을 평가하는 벤치마크인 CSR-Bench를 소개하며, 이는 61개의 세분화된 유형을 포괄합니다. 각 인스턴스는 통합된 이미지-텍스트 해석을 필요로 하도록 구성되었으며, 모달리티로 인한 행동 변화를 진단하기 위해 짝을 이루는 텍스트 전용 대조군을 추가로 제공합니다. 16개의 최첨단 MLLM을 평가한 결과, 체계적인 교차 모달 정렬 격차가 관찰되었습니다. 모델들은 약한 안전성 인식, 간섭 상황에서의 강한 언어 우세성, 그리고 텍스트 전용 대조군에서 멀티모달 입력으로 전환될 때 일관된 성능 저하를 보였습니다. 또한 과도한 거부를 줄이는 것과 안전하고 차별 없는 행동을 유지하는 것 사이에 명확한 상충 관계가 관찰되었는데, 이는 겉보기 안전성 향상의 일부가 견고한 의도 이해보다는 거부 지향적 휴리스틱에서 비롯될 수 있음을 시사합니다. 경고: 본 논문은 안전하지 않은 내용을 포함하고 있습니다.

Original Abstract

Multimodal large language models (MLLMs) enable interaction over both text and images, but their safety behavior can be driven by unimodal shortcuts instead of true joint intent understanding. We introduce CSR-Bench, a benchmark for evaluating cross-modal reliability through four stress-testing interaction patterns spanning Safety, Over-rejection, Bias, and Hallucination, covering 61 fine-grained types. Each instance is constructed to require integrated image-text interpretation, and we additionally provide paired text-only controls to diagnose modality-induced behavior shifts. We evaluate 16 state-of-the-art MLLMs and observe systematic cross-modal alignment gaps. Models show weak safety awareness, strong language dominance under interference, and consistent performance degradation from text-only controls to multimodal inputs. We also observe a clear trade-off between reducing over-rejection and maintaining safe, non-discriminatory behavior, suggesting that some apparent safety gains may come from refusal-oriented heuristics rather than robust intent understanding. WARNING: This paper contains unsafe contents.

1 Citations

0 Influential

2 Altmetric

11.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!