2601.13238v1 Jan 19, 2026 cs.CV

시맨틱 분리 기반의 2단계 빗속 공격: 비전-언어 모델의 날씨 강건성 결함 분석

A Semantic Decoupling-Based Two-Stage Rainy-Day Attack for Revealing Weather Robustness Deficiencies in Vision-Language Models

Feng Zhang

Citations: 76

h-index: 7

Chen-Hao Hu

Citations: 268

h-index: 9

Xiang Chen

Citations: 122

h-index: 4

Zhe Jia

Citations: 3

h-index: 1

Weiwen Shi

Citations: 157

h-index: 7

Jiujiang Guo

Citations: 73

h-index: 5

Yiwei Wei

Citations: 360

h-index: 12

비전-언어 모델(VLMs)은 일반적인 시각 환경에서 수집된 이미지-텍스트 쌍으로 학습되며, 다양한 멀티모달 작업에서 뛰어난 성능을 보입니다. 그러나 실제 환경의 날씨 조건에 대한 VLM의 강건성과, 그러한 구조적 변화 하에서의 모달 간 의미 정렬의 안정성은 충분히 연구되지 않았습니다. 본 논문에서는 빗속 환경에 초점을 맞추어, VLMs를 공격하기 위한 최초의 적대적 프레임워크를 소개합니다. 이 프레임워크는 2단계의 파라미터화된 섭동 모델을 사용하며, 시맨틱 분리를 기반으로 하여 빗으로 인한 의사 결정 변화를 분석합니다. 1단계에서는 강수량의 전반적인 영향을 모델링하기 위해, 저차원의 전역 조절을 적용하여 임베딩 공간을 조정하고, 원래의 의미 기반 의사 결정 경계를 점진적으로 약화시킵니다. 2단계에서는 다중 스케일의 빗방울 모양과 강수량으로 인한 조명 변화를 명시적으로 모델링하여 구조화된 빗 변화를 도입하고, 결과적으로 발생하는 미분 불가능한 날씨 공간을 최적화하여 안정적인 의미 변화를 유도합니다. 픽셀 기반이 아닌 파라미터 공간에서 작동하는 본 프레임워크는 물리적으로 타당하고 해석 가능한 섭동을 생성합니다. 다양한 작업에 대한 실험 결과, 물리적으로 타당하고 제약 조건이 강한 날씨 섭동이라도 주류 VLMs에서 상당한 의미 불일치를 유발할 수 있으며, 이는 실제 환경에서의 배포 시 잠재적인 안전 및 신뢰성 위험을 초래할 수 있음을 보여줍니다. 추가 분석 결과, 조명 모델링과 다중 스케일 빗방울 구조가 이러한 의미 변화의 주요 원인임을 확인했습니다.

Original Abstract

Vision-Language Models (VLMs) are trained on image-text pairs collected under canonical visual conditions and achieve strong performance on multimodal tasks. However, their robustness to real-world weather conditions, and the stability of cross-modal semantic alignment under such structured perturbations, remain insufficiently studied. In this paper, we focus on rainy scenarios and introduce the first adversarial framework that exploits realistic weather to attack VLMs, using a two-stage, parameterized perturbation model based on semantic decoupling to analyze rain-induced shifts in decision-making. In Stage 1, we model the global effects of rainfall by applying a low-dimensional global modulation to condition the embedding space and gradually weaken the original semantic decision boundaries. In Stage 2, we introduce structured rain variations by explicitly modeling multi-scale raindrop appearance and rainfall-induced illumination changes, and optimize the resulting non-differentiable weather space to induce stable semantic shifts. Operating in a non-pixel parameter space, our framework generates perturbations that are both physically grounded and interpretable. Experiments across multiple tasks show that even physically plausible, highly constrained weather perturbations can induce substantial semantic misalignment in mainstream VLMs, posing potential safety and reliability risks in real-world deployment. Ablations further confirm that illumination modeling and multi-scale raindrop structures are key drivers of these semantic shifts.

1 Citations

0 Influential

6 Altmetric

31.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!