2601.06757v1 Jan 11, 2026 cs.CL

MTMCS-Bench: 다중 턴 대화에서 멀티모달 대규모 언어 모델의 상황적 안전성 평가

MTMCS-Bench: Evaluating Contextual Safety of Multimodal Large Language Models in Multi-Turn Dialogues

Zheyuan Liu

Citations: 174

h-index: 7

Dongwhi Kim

Citations: 3

h-index: 1

Yixin Wan

Citations: 973

h-index: 11

Xiangchi Yuan

Citations: 83

h-index: 6

Zhaoxuan Tan

Citations: 247

h-index: 9

Fengran Mo

Citations: 45

h-index: 3

Meng Jiang

Citations: 4

h-index: 2

멀티모달 대규모 언어 모델(MLLM)은 텍스트와 이미지를 통해 상호 작용하는 어시스턴트로 점점 더 많이 사용되고 있으며, 시각적 장면과 진화하는 대화 모두에 위험이 의존하는 경우 상황적 안전성을 평가하는 것이 중요합니다. 기존의 상황적 안전성 벤치마크는 대부분 단일 턴으로 구성되어 있으며, 악의적인 의도가 점진적으로 나타나는 방식이나 동일한 장면이 어떻게 긍정적인 목적과 악의적인 목적을 모두 지원할 수 있는지를 놓치는 경우가 많습니다. 본 논문에서는 현실적인 이미지와 다중 턴 대화를 포함하는 벤치마크인 Multi-Turn Multimodal Contextual Safety Benchmark (MTMCS-Bench)를 소개합니다. MTMCS-Bench는 MLLM의 상황적 안전성을 평가하며, 위험 증가 기반의 위험과 컨텍스트 전환 위험이라는 두 가지 상호 보완적인 환경을 고려합니다. MTMCS-Bench는 구조화된 평가를 제공하는 안전하고 위험한 대화 쌍을 포함하며, 3만 개 이상의 멀티모달(이미지+텍스트) 및 단일모달(텍스트만) 샘플을 포함합니다. 이 벤치마크는 상황적 의도 인식, 위험한 경우에 대한 안전 인식, 그리고 긍정적인 경우에 대한 유용성을 개별적으로 측정하는 지표를 제공합니다. 8개의 오픈 소스 및 7개의 독점 MLLM을 대상으로 실험한 결과, 상황적 안전성과 유용성 간의 지속적인 균형 문제가 나타났으며, 모델들은 점진적인 위험을 놓치거나 긍정적인 대화를 지나치게 거부하는 경향을 보였습니다. 마지막으로, 현재 사용 가능한 5가지 안전 장치를 평가한 결과, 이러한 안전 장치는 일부 문제를 완화하지만 다중 턴 상황적 위험을 완전히 해결하지는 못하는 것으로 나타났습니다.

Original Abstract

Multimodal large language models (MLLMs) are increasingly deployed as assistants that interact through text and images, making it crucial to evaluate contextual safety when risk depends on both the visual scene and the evolving dialogue. Existing contextual safety benchmarks are mostly single-turn and often miss how malicious intent can emerge gradually or how the same scene can support both benign and exploitative goals. We introduce the Multi-Turn Multimodal Contextual Safety Benchmark (MTMCS-Bench), a benchmark of realistic images and multi-turn conversations that evaluates contextual safety in MLLMs under two complementary settings, escalation-based risk and context-switch risk. MTMCS-Bench offers paired safe and unsafe dialogues with structured evaluation. It contains over 30 thousand multimodal (image+text) and unimodal (text-only) samples, with metrics that separately measure contextual intent recognition, safety-awareness on unsafe cases, and helpfulness on benign ones. Across eight open-source and seven proprietary MLLMs, we observe persistent trade-offs between contextual safety and utility, with models tending to either miss gradual risks or over-refuse benign dialogues. Finally, we evaluate five current guardrails and find that they mitigate some failures but do not fully resolve multi-turn contextual risks.

2 Citations

0 Influential

5.5 Altmetric

29.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!