2602.22570v1 Feb 26, 2026 cs.CV

가이드의 중요성: 텍스트-이미지 생성 평가의 함정을 재고하다

Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation

Shuo Yang

Citations: 9

h-index: 2

Shitong Shao

Citations: 215

h-index: 6

Lichen Bai

Citations: 296

h-index: 7

Zikai Zhou

Citations: 250

h-index: 7

Junjiang Wu

Citations: 44

h-index: 3

Zeke Xie

Citations: 371

h-index: 8

Dian Xie

Citations: 28

h-index: 3

Bo Chen

Citations: 167

h-index: 3

클래시파이어-프리 가이던스(CFG)는 확산 모델이 다양한 분야에서 뛰어난 조건부 생성을 달성하는 데 기여했습니다. 최근에는 향상된 생성 품질과 인간 선호도를 갖춘 더 많은 확산 가이딩 방법이 등장했습니다. 그러나 이러한 새로운 확산 가이딩 방법은 실제로 의미 있는 개선을 이룰 수 있을까요? 본 논문에서는 최근 확산 가이딩 연구의 진행 상황을 재검토합니다. 우리의 연구는 크게 네 가지 기여로 구성됩니다. 첫째, 우리는 일반적인 인간 선호도 모델이 큰 가이드 스케일에 강한 편향을 보이는 중요한 평가상의 함정을 밝혀냅니다. CFG 스케일을 단순히 증가시키면, 이미지 품질이 심각하게 저하되더라도(예: 과포화 및 아티팩트), 강력한 의미적 정렬으로 인해 양적 평가 점수가 쉽게 향상될 수 있습니다. 둘째, 우리는 현재의 가이딩 방법과 CFG 간의 공정한 비교를 가능하게 하고, CFG 효과와 직교하거나 평행한 효과를 식별하기 위해 효과적인 가이드 스케일 보정을 사용하는 새로운 가이딩 인식 평가(GA-Eval) 프레임워크를 소개합니다. 셋째, 평가상의 함정에서 영감을 받아, 우리는 기존의 평가 프레임워크에서 인간 선호도 점수를 크게 향상시키지만 실제로 효과가 없는 'Transcendent Diffusion Guidance (TDG)' 방법을 설계했습니다. 넷째, 우리는 광범위한 실험을 통해, 기존의 평가 프레임워크와 제안된 GA-Eval 프레임워크 내에서 최근 개발된 8가지 확산 가이딩 방법을 경험적으로 평가했습니다. 주목할 점은, 단순히 CFG 스케일을 증가시키는 것만으로도 대부분의 연구된 확산 가이딩 방법과 경쟁할 수 있으며, 모든 방법이 표준 CFG에 비해 승률 저하라는 심각한 문제를 겪는다는 것입니다. 우리의 연구는 이 분야의 평가 패러다임과 미래 방향에 대한 재고를 촉구할 것입니다.

Original Abstract

Classifier-free guidance (CFG) has helped diffusion models achieve great conditional generation in various fields. Recently, more diffusion guidance methods have emerged with improved generation quality and human preference. However, can these emerging diffusion guidance methods really achieve solid and significant improvements? In this paper, we rethink recent progress on diffusion guidance. Our work mainly consists of four contributions. First, we reveal a critical evaluation pitfall that common human preference models exhibit a strong bias towards large guidance scales. Simply increasing the CFG scale can easily improve quantitative evaluation scores due to strong semantic alignment, even if image quality is severely damaged (e.g., oversaturation and artifacts). Second, we introduce a novel guidance-aware evaluation (GA-Eval) framework that employs effective guidance scale calibration to enable fair comparison between current guidance methods and CFG by identifying the effects orthogonal and parallel to CFG effects. Third, motivated by the evaluation pitfall, we design Transcendent Diffusion Guidance (TDG) method that can significantly improve human preference scores in the conventional evaluation framework but actually does not work in practice. Fourth, in extensive experiments, we empirically evaluate recent eight diffusion guidance methods within the conventional evaluation framework and the proposed GA-Eval framework. Notably, simply increasing the CFG scales can compete with most studied diffusion guidance methods, while all methods suffer severely from winning rate degradation over standard CFG. Our work would strongly motivate the community to rethink the evaluation paradigm and future directions of this field.

3 Citations

0 Influential

4 Altmetric

23.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!