2605.30341v1 May 28, 2026 cs.CV

GPIC: A Giant Permissive Image Corpus for Visual Generation

Juan Carlos Niebles
Juan Carlos Niebles
Salesforce
Citations: 26,446
h-index: 66
Fei-Fei Li
Fei-Fei Li
Citations: 1,380
h-index: 15
Keshigeyan Chandrasegaran
Keshigeyan Chandrasegaran
Citations: 738
h-index: 13
Jiajun Wu
Jiajun Wu
Citations: 655
h-index: 9
Kyle Sargent
Kyle Sargent
Citations: 856
h-index: 12
Suchir Agarwal
Suchir Agarwal
Citations: 1
h-index: 1
Michael Poli
Michael Poli
Citations: 77
h-index: 4
Justin Johnson
Justin Johnson
Citations: 55
h-index: 2
M. Jang
M. Jang
Citations: 1
h-index: 1

Studying scalable methods for visual generative modeling requires large, accessible, and stable datasets. We introduce GPIC, a Giant Permissive Image Corpus of approximately 28 trillion pixels. GPIC comprises diverse internet images captioned by a state-of-the-art vision-language model, including 100M training, 200K validation, and 1M test examples. Moreover, all GPIC images are permissively licensed for both research and commercial use. GPIC is safety-filtered, deduplicated, and centrally hosted on Hugging Face. We provide a benchmarking protocol for generative modeling on GPIC. Finally, we provide a reference baseline for pixel-space flow matching on GPIC. Our dataset, benchmark, and models are available at https://huggingface.co/datasets/stanford-vision-lab/gpic. Evaluation toolkit and code are available at https://gpic.stanford.edu

0 Citations
0 Influential
50 Altmetric
250.0 Score
Original PDF
0

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!