2606.16751v1 Jun 15, 2026 cs.CR

Automated jailbreak attack targeting multiple defense strategies

Yanqing Li
Yanqing Li
Citations: 0
h-index: 0
Qi Wang
Qi Wang
Citations: 13
h-index: 2
Chengcheng Wan
Chengcheng Wan
Citations: 265
h-index: 8
Weijia He
Weijia He
Citations: 490
h-index: 7
Hanqi Sun
Hanqi Sun
Citations: 5
h-index: 1
Xiaodong Gu
Xiaodong Gu
Citations: 302
h-index: 9
Jiangtao Wang
Jiangtao Wang
Citations: 0
h-index: 0

Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks. However, their safety remains a critical concern due to their susceptibility to adversarial prompt-based attacks. In this paper, we present UNIATTACK, an adversarial testing framework designed from a defense-oriented perspective to systematically construct effective black-box attack prompts. Unlike prior approaches that rely on static templates or iterative model-specific tuning, UNIATTACK extracts minimal but high-impact attack features from diverse existing attacks, optimizes them via a specialized attacker LLM, and composes them into flexible templates through automated refinement process. This feature-centric construction enables one-shot attacks that generalize across multiple models and safety categories, providing a practical tool for assessing LLM robustness. Our evaluation results shows that compared to the baselines, UNIATTACK achieves an average attack success rate (ASR) improvement of 64.63\%-248.82\% on models deployed with multi-layered defense mechanisms and it only takes 0.03\%-4.96\% cost of the baselines. UNIATTACK artifact is available at https://anonymous.4open.science/r/UniAttack-Artifact-30F1.

0 Citations
0 Influential
4.5 Altmetric
22.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!