2604.16479v1 Apr 12, 2026 cs.CV

영상 확산 모델을 위한 잠재 공간 압축 변분 오토인코더

Latent-Compressed Variational Autoencoder for Video Diffusion Models

Wenshuai Zhao

Citations: 31

h-index: 3

Juho Kannala

Citations: 12,934

h-index: 45

Arno Solin

Citations: 145

h-index: 6

Jiarui Guan

Citations: 14

h-index: 2

Zhengtao Zou

Citations: 1

h-index: 1

잠재 공간 확산 모델에서 사용되는 영상 변분 오토인코더(VAE)는 일반적으로 고품질의 영상 복원을 보장하기 위해 충분히 많은 잠재 채널을 필요로 합니다. 그러나 최근 연구에 따르면, 잠재 채널 수가 지나치게 많으면 잠재 공간 확산 모델의 수렴을 방해하고 생성 성능을 저하시킬 수 있으며, 이는 복원 품질이 높더라도 발생할 수 있습니다. 본 논문에서는 잠재 공간 압축 방법을 제안합니다. 이 방법은 잠재 채널 수를 직접 줄이는 대신, 영상 잠재 표현의 고주파 성분을 제거하여 복원 정확도를 손상시키는 것을 방지합니다. 실험 결과는 제안된 방법이 강력한 기준 모델에 비해 우수한 영상 복원 품질을 달성하며, 동시에 동일한 전체 압축 비율을 유지함을 보여줍니다.

Original Abstract

Video variational autoencoders (VAEs) used in latent diffusion models typically require a sufficiently large number of latent channels to ensure high-quality video reconstruction. However, recent studies have revealed that an excessive number of latent channels can impede the convergence of latent diffusion models and deteriorate their generative performance, even when reconstruction quality remains high. We propose a latent compression method that removes high-frequency components in video latent representations rather than directly reducing the number of channels, which often compromises reconstruction fidelity. Experimental results demonstrate that the proposed method achieves superior video reconstruction quality compared to strong baselines while maintaining the same overall compression ratio.

0 Citations

0 Influential

22.5 Altmetric

112.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!