2604.10460v1 Apr 12, 2026 cs.CV

소셜 플랫폼에서의 책임감 있는 AI 생성 콘텐츠: 스테가노그래피 기반 속성 부여 및 다중 모드 악성 콘텐츠 탐지

Toward Accountable AI-Generated Content on Social Platforms: Steganographic Attribution and Multimodal Harm Detection

Bingyu Shen

Citations: 337

h-index: 6

Xinlei Guan

Citations: 25

h-index: 2

Tejaswi Dhandu

Citations: 3

h-index: 1

Meng Xu

Citations: 20

h-index: 3

U. R. Tida

Citations: 301

h-index: 9

David Arosemena

Citations: 1

h-index: 1

Kuan Huang

Citations: 18

h-index: 3

Miles Q. Li

Citations: 169

h-index: 3

Ruiyang Qin

Citations: 82

h-index: 3

Boyang Li

Citations: 120

h-index: 5

생성형 AI의 급속한 발전은 콘텐츠 관리 및 디지털 포렌식 분야에 새로운 과제를 제시하고 있습니다. 특히, 무해한 AI 생성 이미지는 악성 또는 오해를 불러일으키는 텍스트와 결합되어 탐지하기 어려운 형태로 악용될 수 있습니다. 이러한 맥락적 악용은 기존의 관리 시스템을 약화시키고, 합성 이미지에 일반적으로 영구적인 메타데이터나 장치 정보가 부족하기 때문에 출처 추적을 어렵게 만듭니다. 본 연구에서는 이미지 생성 시 암호학적으로 서명된 식별자를 삽입하여 속성 부여를 가능하게 하는 스테가노그래피 기반 프레임워크를 제안하고, 다중 모드 악성 콘텐츠 탐지를 사용하여 속성 부여 검증을 위한 트리거로 활용합니다. 제안하는 시스템은 다양한 워터마킹 방법을 평가하며, 공간, 주파수 및 웨이블릿 도메인을 모두 포함합니다. 또한, CLIP 기반 퓨전 모델을 사용하여 다중 모드 악성 콘텐츠 탐지를 수행합니다. 실험 결과, 스프레드-스펙트럼 워터마킹, 특히 웨이블릿 도메인에서, 블러 변조에 대한 높은 내구성을 제공하며, 다중 모드 퓨전 탐지기는 0.99의 AUC-ROC 값을 달성하여 신뢰할 수 있는 교차 모드 속성 부여 검증을 가능하게 합니다. 이러한 구성 요소들은 AI 생성 이미지의 악의적인 사용을 신뢰성 있게 추적할 수 있는 전체적인 포렌식 파이프라인을 형성하며, 현대적인 합성 미디어 환경에서 책임성을 강화하는 데 기여합니다. 본 연구의 코드는 GitHub에서 확인할 수 있습니다: https://github.com/bli1/steganography

Original Abstract

The rapid growth of generative AI has introduced new challenges in content moderation and digital forensics. In particular, benign AI-generated images can be paired with harmful or misleading text, creating difficult-to-detect misuse. This contextual misuse undermines the traditional moderation framework and complicates attribution, as synthetic images typically lack persistent metadata or device signatures. We introduce a steganography enabled attribution framework that embeds cryptographically signed identifiers into images at creation time and uses multimodal harmful content detection as a trigger for attribution verification. Our system evaluates five watermarking methods across spatial, frequency, and wavelet domains. It also integrates a CLIP-based fusion model for multimodal harmful-content detection. Experiments demonstrate that spread-spectrum watermarking, especially in the wavelet domain, provides strong robustness under blur distortions, and our multimodal fusion detector achieves an AUC-ROC of 0.99, enabling reliable cross-modal attribution verification. These components form an end-to-end forensic pipeline that enables reliable tracing of harmful deployments of AI-generated imagery, supporting accountability in modern synthetic media environments. Our code is available at GitHub: https://github.com/bli1/steganography

1 Citations

0 Influential

24.5 Altmetric

123.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!