2603.08090v1 Mar 09, 2026 cs.CV

DSH-Bench: 계층적 주제 분류를 활용한 난이도 및 시나리오 기반 텍스트-이미지 생성 벤치마크

DSH-Bench: A Difficulty- and Scenario-Aware Benchmark with Hierarchical Subject Taxonomy for Subject-Driven Text-to-Image Generation

Jie Jiang

Citations: 45

h-index: 3

Peng Shu

Citations: 11

h-index: 2

Shuangfeng Li

Citations: 33

h-index: 4

Zhenyu Hu

Citations: 54

h-index: 4

Qing Wang

Citations: 196

h-index: 5

Tengbao Cao

Citations: 215

h-index: 4

L. Liao

Citations: 0

h-index: 0

Longfei Lu

Citations: 18

h-index: 2

Liqun Liu

Citations: 22

h-index: 2

Hang Chen

Citations: 53

h-index: 5

Mengge Xue

Citations: 22

h-index: 2

Yuan Chen

Citations: 33

h-index: 2

Chao Deng

Citations: 20

h-index: 2

Huan Yu

Citations: 7

h-index: 2

사용자의 지시에 따라 특정 주제를 묘사하는 새로운 이미지를 생성하는 주제 기반 텍스트-이미지(T2I) 생성 분야에서 상당한 발전이 이루어졌습니다. 그러나 이러한 모델을 평가하는 것은 여전히 중요한 과제입니다. 기존 벤치마크는 다음과 같은 중요한 한계를 가지고 있습니다. 1) 주제 이미지의 다양성과 포괄성이 부족하며, 2) 다양한 주제 난이도 수준 및 프롬프트 시나리오에 대한 모델 성능을 평가하는 데 필요한 세분성이 부족하며, 3) 후속 모델 개선을 위한 실행 가능한 통찰력과 진단 지침이 거의 없습니다. 이러한 한계를 해결하기 위해, 우리는 DSH-Bench를 제안합니다. DSH-Bench는 다음과 같은 네 가지 주요 혁신을 통해 주제 기반 T2I 모델의 체계적인 다각적 분석을 가능하게 하는 포괄적인 벤치마크입니다. 1) 58개의 세분화된 범주에 걸쳐 포괄적인 주제 표현을 보장하는 계층적 분류 샘플링 메커니즘, 2) 주제 난이도 수준과 프롬프트 시나리오를 모두 분류하여 세분화된 기능 평가를 위한 혁신적인 분류 체계, 3) 기존 측정 기준보다 인간 평가와 9.4% 더 높은 상관 관계를 보이는 새로운 주제 일관성 점수(SICS) 메트릭을 통해 주제 보존을 정량화, 4) 벤치마크에서 얻은 포괄적인 진단 통찰력을 제공하여 향후 모델 훈련 패러다임 및 데이터 구축 전략을 최적화하는 데 중요한 지침을 제공합니다. 19개의 선도적인 모델에 대한 광범위한 실증적 평가를 통해 DSH-Bench는 현재 접근 방식의 이전에 감춰졌던 한계를 밝혀내고, 향후 연구 및 개발을 위한 구체적인 방향을 제시합니다.

Original Abstract

Significant progress has been achieved in subject-driven text-to-image (T2I) generation, which aims to synthesize new images depicting target subjects according to user instructions. However, evaluating these models remains a significant challenge. Existing benchmarks exhibit critical limitations: 1) insufficient diversity and comprehensiveness in subject images, 2) inadequate granularity in assessing model performance across different subject difficulty levels and prompt scenarios, and 3) a profound lack of actionable insights and diagnostic guidance for subsequent model refinement. To address these limitations, we propose DSH-Bench, a comprehensive benchmark that enables systematic multi-perspective analysis of subject-driven T2I models through four principal innovations: 1) a hierarchical taxonomy sampling mechanism ensuring comprehensive subject representation across 58 fine-grained categories, 2) an innovative classification scheme categorizing both subject difficulty level and prompt scenario for granular capability assessment, 3) a novel Subject Identity Consistency Score (SICS) metric demonstrating a 9.4\% higher correlation with human evaluation compared to existing measures in quantifying subject preservation, and 4) a comprehensive set of diagnostic insights derived from the benchmark, offering critical guidance for optimizing future model training paradigms and data construction strategies. Through an extensive empirical evaluation of 19 leading models, DSH-Bench uncovers previously obscured limitations in current approaches, establishing concrete directions for future research and development.

0 Citations

0 Influential

2.5 Altmetric

12.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!