2603.06165v1 Mar 06, 2026 cs.CV

반사적 플로우 샘플링 개선

Reflective Flow Sampling Enhancement

Shitong Shao

Citations: 215

h-index: 6

Lichen Bai

Citations: 296

h-index: 7

Zikai Zhou

Citations: 250

h-index: 7

Zeke Xie

Citations: 371

h-index: 8

Muyao Wang

Citations: 20

h-index: 2

Haoyi Xiong

Citations: 158

h-index: 5

Bo Han

Citations: 451

h-index: 5

텍스트-이미지 생성에 대한 수요가 증가함에 따라 생성 모델 분야에서 빠른 발전이 이루어지고 있습니다. 최근에는 FLUX와 같은 플로우 매칭 알고리즘으로 학습된 텍스트-이미지 확산 모델이 괄목할 만한 성과를 거두며 기존 확산 모델의 강력한 대안으로 부상했습니다. 동시에, 추론 단계에서의 개선 전략이 텍스트-이미지 확산 모델의 생성 품질과 텍스트 프롬프트 정렬을 향상시키는 것으로 나타났습니다. 그러나 이러한 기술은 주로 기존 확산 모델에 적용되며, 플로우 모델에서는 제대로 작동하지 않는 경우가 많습니다. 이러한 격차를 해소하기 위해, 본 연구에서는 특히 CFG(Classifier-Free Guidance) 기반 변형(예: FLUX) 플로우 모델에 명시적으로 설계된 이론적으로 기반하고 학습이 필요 없는 추론 개선 프레임워크인 반사적 플로우 샘플링(RF-Sampling)을 제안합니다. 휴리스틱한 해석에서 벗어나, RF-Sampling이 텍스트-이미지 정렬 점수에 대한 그래디언트 상승을 암묵적으로 수행한다는 것을 증명하는 형식적인 유도를 제공합니다. RF-Sampling은 텍스트 표현의 선형 조합을 활용하고 이를 플로우 역전산과 통합하여, 모델이 입력 프롬프트와 더 일관성 있는 노이즈 공간을 탐색할 수 있도록 합니다. 다양한 벤치마크를 대상으로 한 광범위한 실험 결과, RF-Sampling은 생성 품질과 프롬프트 정렬 모두를 지속적으로 향상시키는 것으로 나타났습니다. 또한, RF-Sampling은 FLUX에서 어느 정도 테스트 시간 스케일링 능력을 나타낼 수 있는 최초의 추론 개선 방법입니다.

Original Abstract

The growing demand for text-to-image generation has led to rapid advances in generative modeling. Recently, text-to-image diffusion models trained with flow matching algorithms, such as FLUX, have achieved remarkable progress and emerged as strong alternatives to conventional diffusion models. At the same time, inference-time enhancement strategies have been shown to improve the generation quality and text-prompt alignment of text-to-image diffusion models. However, these techniques are mainly applicable to conventional diffusion models and usually fail to perform well on flow models. To bridge this gap, we propose Reflective Flow Sampling (RF-Sampling), a theoretically-grounded and training-free inference enhancement framework explicitly designed for flow models, especially for the CFG-distilled variants (i.e., models distilled from CFG guidance techniques), like FLUX. Departing from heuristic interpretations, we provide a formal derivation proving that RF-Sampling implicitly performs gradient ascent on the text-image alignment score. By leveraging a linear combination of textual representations and integrating them with flow inversion, RF-Sampling allows the model to explore noise spaces that are more consistent with the input prompt. Extensive experiments across multiple benchmarks demonstrate that RF-Sampling consistently improves both generation quality and prompt alignment. Moreover, RF-Sampling is also the first inference enhancement method that can exhibit test-time scaling ability to some extent on FLUX.

0 Citations

0 Influential

4 Altmetric

20.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!