2603.12057v1 Mar 12, 2026 cs.CV

가이드 기반 시각적 생성: 가중 h-변환 샘플링을 이용한 방법

Coarse-Guided Visual Generation via Weighted h-Transform Sampling

Long Chen

Citations: 22

h-index: 2

Yanghao Wang

Citations: 30

h-index: 4

Ziqi Jiang

Citations: 18

h-index: 3

Zhen Wang

Citations: 8

h-index: 1

가이드 기반 시각적 생성은 저품질 또는 열화된 초기 이미지로부터 세밀한 시각적 샘플을 합성하는 기술로, 다양한 실제 응용 분야에서 필수적입니다. 훈련 기반 방법은 효과적이지만, 높은 훈련 비용과 쌍으로 묶인 데이터 수집으로 인한 일반화 능력의 제한이라는 근본적인 한계를 가지고 있습니다. 이에 따라, 최근에는 훈련 없이 사전 훈련된 확산 모델을 활용하고 샘플링 과정에서 가이드 정보를 통합하는 연구들이 진행되고 있습니다. 하지만 이러한 훈련 없는 방법들은 일반적으로 양방향(세밀-초기) 변환 연산자(예: 양선형 보간)를 알아야 하거나, 가이드 정보와 합성 품질 간의 균형을 맞추기 어렵다는 단점이 있습니다. 이러한 문제점을 해결하기 위해, 우리는 원하는 조건 하에서 확률적 과정을 제어할 수 있는 도구인 h-변환을 활용한 새로운 가이드 기반 방법을 제안합니다. 구체적으로, 각 샘플링 타임스텝에서 원래의 미분 방정식에 드리프트 함수를 추가하여 전이 확률을 수정함으로써, 생성 과정을 이상적인 세밀 이미지로 유도합니다. 불가피하게 발생하는 근사 오차를 해결하기 위해, 오차 증가에 따라 해당 항의 가중치를 점진적으로 감소시키는 노이즈 레벨 인지 스케줄을 도입하여, 가이드 정보 준수와 고품질 합성의 균형을 유지합니다. 다양한 이미지 및 비디오 생성 작업에 대한 광범위한 실험 결과는 제안하는 방법의 효과성과 일반화 능력을 입증합니다.

Original Abstract

Coarse-guided visual generation, which synthesizes fine visual samples from degraded or low-fidelity coarse references, is essential for various real-world applications. While training-based approaches are effective, they are inherently limited by high training costs and restricted generalization due to paired data collection. Accordingly, recent training-free works propose to leverage pretrained diffusion models and incorporate guidance during the sampling process. However, these training-free methods either require knowing the forward (fine-to-coarse) transformation operator, e.g., bicubic downsampling, or are difficult to balance between guidance and synthetic quality. To address these challenges, we propose a novel guided method by using the h-transform, a tool that can constrain stochastic processes (e.g., sampling process) under desired conditions. Specifically, we modify the transition probability at each sampling timestep by adding to the original differential equation with a drift function, which approximately steers the generation toward the ideal fine sample. To address unavoidable approximation errors, we introduce a noise-level-aware schedule that gradually de-weights the term as the error increases, ensuring both guidance adherence and high-quality synthesis. Extensive experiments across diverse image and video generation tasks demonstrate the effectiveness and generalization of our method.

1 Citations

1 Influential

2 Altmetric

13.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!