2601.05339v1 Jan 08, 2026 cs.CR

다중 모드 대규모 언어 모델에서의 다단계 탈옥 공격

Multi-turn Jailbreaking Attack in Multi-Modal Large Language Models

B. Das

Citations: 465

h-index: 5

Md Tasnim Jawad

Florida International University

Citations: 461

h-index: 7

Joaquin Molto

Citations: 6

h-index: 2

M. Amini

Citations: 431

h-index: 3

Yanzhao Wu

Florida International University

Citations: 1,913

h-index: 22

최근 몇 년 동안, 다중 모드 대규모 언어 모델(MLLM)의 보안 취약성은 생성형 인공지능(GenAI) 연구에서 심각한 문제로 부상했습니다. 높은 정확도로 다중 모드 작업을 수행할 수 있는 이러한 고도화된 모델은 신중하게 설계된 보안 공격, 특히 모델의 동작을 조작하고 안전 제약을 우회할 수 있는 탈옥 공격에 매우 취약합니다. 본 논문에서는 MLLM에 대한 제안된 다단계 탈옥 공격 및 다중 LLM 기반 방어 기술을 체계적으로 분석하는 종합적인 프레임워크인 MJAD-MLLMs를 소개합니다. 본 논문에서는 세 가지 독창적인 기여를 합니다. 첫째, 다단계 프롬프팅 환경에서 MLLM의 취약점을 악용하는 새로운 다단계 탈옥 공격을 소개합니다. 둘째, MLLM에서의 탈옥 공격을 효과적으로 완화하기 위한 새로운 파편 최적화 및 다중 LLM 기반 방어 메커니즘인 FragGuard를 제안합니다. 셋째, 제안된 공격 및 방어 기술의 효과를 최첨단(SOTA) 오픈 소스 및 클로즈드 소스 MLLM 및 벤치마크 데이터 세트에 대한 광범위한 실험을 통해 평가하고, 기존 기술과의 성능을 비교합니다.

Original Abstract

In recent years, the security vulnerabilities of Multi-modal Large Language Models (MLLMs) have become a serious concern in the Generative Artificial Intelligence (GenAI) research. These highly intelligent models, capable of performing multi-modal tasks with high accuracy, are also severely susceptible to carefully launched security attacks, such as jailbreaking attacks, which can manipulate model behavior and bypass safety constraints. This paper introduces MJAD-MLLMs, a holistic framework that systematically analyzes the proposed Multi-turn Jailbreaking Attacks and multi-LLM-based defense techniques for MLLMs. In this paper, we make three original contributions. First, we introduce a novel multi-turn jailbreaking attack to exploit the vulnerabilities of the MLLMs under multi-turn prompting. Second, we propose a novel fragment-optimized and multi-LLM defense mechanism, called FragGuard, to effectively mitigate jailbreaking attacks in the MLLMs. Third, we evaluate the efficacy of the proposed attacks and defenses through extensive experiments on several state-of-the-art (SOTA) open-source and closed-source MLLMs and benchmark datasets, and compare their performance with the existing techniques.

1 Citations

0 Influential

11 Altmetric

56.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!