2602.24096v1 Feb 27, 2026 cs.CV

DiffusionHarmonizer: 온라인 디퓨전 향상기를 활용한 신경망 기반 재구성 및 사실적인 시뮬레이션 융합

DiffusionHarmonizer: Bridging Neural Reconstruction and Photorealistic Simulation with Online Diffusion Enhancer

O. Litany

Citations: 9,874

h-index: 37

Zan Gojcic

Citations: 4,497

h-index: 24

Riccardo de Lutio

Citations: 378

h-index: 6

Yen-Yu Chang

Citations: 1,212

h-index: 11

Haithem Turki

Citations: 1,165

h-index: 10

Sanja Fidler

Citations: 2,488

h-index: 20

Yuxuan Zhang

Citations: 810

h-index: 3

Katar'ina T'othov'a

Citations: 25

h-index: 2

Zian Wang

Citations: 1,607

h-index: 15

Kangxue Yin

Citations: 110

h-index: 2

시뮬레이션은 자율 주행 차량과 같은 자율 로봇의 개발 및 평가에 필수적입니다. 신경망 기반 재구성은 실제 데이터를 기반으로 다양한 시나리오를 자동화되고 확장 가능한 방식으로 시뮬레이션할 수 있는 유망한 솔루션으로 부상하고 있습니다. 그러나 NeRF 및 3D Gaussian Splatting과 같은 방법은 시각적으로 인상적인 결과를 제공하지만, 특히 새로운 뷰를 렌더링할 때 종종 왜곡 현상을 나타내며, 특히 다른 장면에서 캡처된 동적 객체를 현실적으로 통합하는 데 어려움을 겪습니다. 이러한 한계를 극복하기 위해, 우리는 렌더링 결과를 시간적으로 일관성 있게 변환하고 현실감을 향상시키는 온라인 생성 향상 프레임워크인 DiffusionHarmonizer를 소개합니다. 핵심은 사전 훈련된 다단계 이미지 디퓨전 모델에서 변환된 단일 단계 시간 조건부 향상기로, 단일 GPU에서 온라인 시뮬레이터에서 실행할 수 있습니다. 효과적인 학습의 핵심은 외관 조화, 왜곡 수정 및 조명 현실감을 강조하는 합성-실제 데이터 쌍을 구성하는 맞춤형 데이터 큐레이션 파이프라인입니다. 그 결과는 연구 및 생산 환경 모두에서 시뮬레이션의 충실도를 크게 향상시키는 확장 가능한 시스템입니다.

Original Abstract

Simulation is essential to the development and evaluation of autonomous robots such as self-driving vehicles. Neural reconstruction is emerging as a promising solution as it enables simulating a wide variety of scenarios from real-world data alone in an automated and scalable way. However, while methods such as NeRF and 3D Gaussian Splatting can produce visually compelling results, they often exhibit artifacts particularly when rendering novel views, and fail to realistically integrate inserted dynamic objects, especially when they were captured from different scenes. To overcome these limitations, we introduce DiffusionHarmonizer, an online generative enhancement framework that transforms renderings from such imperfect scenes into temporally consistent outputs while improving their realism. At its core is a single-step temporally-conditioned enhancer that is converted from a pretrained multi-step image diffusion model, capable of running in online simulators on a single GPU. The key to training it effectively is a custom data curation pipeline that constructs synthetic-real pairs emphasizing appearance harmonization, artifact correction, and lighting realism. The result is a scalable system that significantly elevates simulation fidelity in both research and production environments.

3 Citations

0 Influential

18.5 Altmetric

95.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!