2601.00553v1 Jan 02, 2026 cs.CV

인간이 생성한 이미지와 AI가 생성한 이미지 구별을 위한 종합 데이터셋

A Comprehensive Dataset for Human vs. AI Generated Image Detection

Aman Chadha

Citations: 1,743

h-index: 15

Vinija Jain

Citations: 2,079

h-index: 15

Amitava Das

Citations: 942

h-index: 9

Rajarshi Roy

Citations: 10

h-index: 2

Nasrin Imanpour

Citations: 57

h-index: 4

Ashhar Aziz

Citations: 13

h-index: 2

Shashwat Bajpai

BITS Pilani

Citations: 13

h-index: 2

Gurpreet Singh

Citations: 12

h-index: 2

Shwetangshu Biswas

Citations: 15

h-index: 2

Kapil Wanaskar

Citations: 20

h-index: 3

Parth Patwa

Citations: 1,340

h-index: 16

Subhankar Ghosh

Citations: 5

h-index: 1

Shreyas Dixit

Citations: 12

h-index: 3

Nilesh Ranjan Pal

Citations: 4

h-index: 1

Vipula Rawte

Citations: 1,565

h-index: 11

Ritvik Garimella

Citations: 12

h-index: 2

Gaytri Jena

Citations: 8

h-index: 2

Vasu Sharma

Citations: 19

h-index: 3

Aishwarya N. Reganti

Citations: 754

h-index: 16

Stable Diffusion, DALL-E, MidJourney와 같은 다중 모드 생성 AI 시스템은 합성 이미지 제작 방식을 근본적으로 변화시켰습니다. 이러한 도구는 혁신을 주도하지만, 동시에 오해를 불러일으키는 콘텐츠, 허위 정보, 조작된 미디어의 확산을 가능하게 합니다. 생성된 이미지를 사진과 구별하기가 점점 더 어려워짐에 따라, 이를 탐지하는 것이 시급한 과제가 되었습니다. 이러한 문제에 대응하기 위해, MS COCO 데이터셋을 기반으로 구축된 96,000개의 실제 및 합성 데이터 포인트를 포함하는 AI 생성 이미지 탐지를 위한 새로운 데이터셋인 MS COCOAI를 공개합니다. 합성 이미지를 생성하기 위해, Stable Diffusion 3, Stable Diffusion 2.1, SDXL, DALL-E 3, MidJourney v6의 다섯 가지 생성 모델을 사용했습니다. 이 데이터셋을 기반으로, 다음 두 가지 과제를 제안합니다: (1) 이미지를 실제 이미지 또는 생성된 이미지로 분류하고, (2) 특정 합성 이미지를 생성한 모델을 식별합니다. 이 데이터셋은 https://huggingface.co/datasets/Rajarshi-Roy-research/Defactify_Image_Dataset 에서 이용할 수 있습니다.

Original Abstract

Multimodal generative AI systems like Stable Diffusion, DALL-E, and MidJourney have fundamentally changed how synthetic images are created. These tools drive innovation but also enable the spread of misleading content, false information, and manipulated media. As generated images become harder to distinguish from photographs, detecting them has become an urgent priority. To combat this challenge, We release MS COCOAI, a novel dataset for AI generated image detection consisting of 96000 real and synthetic datapoints, built using the MS COCO dataset. To generate synthetic images, we use five generators: Stable Diffusion 3, Stable Diffusion 2.1, SDXL, DALL-E 3, and MidJourney v6. Based on the dataset, we propose two tasks: (1) classifying images as real or generated, and (2) identifying which model produced a given synthetic image. The dataset is available at https://huggingface.co/datasets/Rajarshi-Roy-research/Defactify_Image_Dataset.

3 Citations

1 Influential

28 Altmetric

145.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!