2602.01060v1 Feb 01, 2026 cs.SD

TLDiffGAN: 시간 정보를 융합한 잠재 확산-GAN 프레임워크를 이용한 이상 음향 탐지

TLDiffGAN: A Latent Diffusion-GAN Framework with Temporal Information Fusion for Anomalous Sound Detection

Chengyuan Ma

Citations: 0

h-index: 0

Wenming Yang

Citations: 0

h-index: 0

Peng Jia

Citations: 1

h-index: 1

Hongyu Guo

Citations: 19

h-index: 1

비지도 이상 음향 탐지를 위한 기존 생성 모델은 정상 음향의 복잡한 특징 분포를 완전히 포착하는 데 한계가 있으며, 이 분야에서 강력한 확산 모델의 잠재력은 아직 충분히 활용되지 못하고 있습니다. 이러한 문제를 해결하기 위해, 우리는 두 가지 상호 보완적인 기능을 가진 새로운 프레임워크인 TLDiffGAN을 제안합니다. 한 가지 기능은 GAN 생성기에 잠재 확산 모델을 통합하여 적대적 학습을 수행함으로써 판별기의 역할을 더욱 어렵게 만들고 생성된 샘플의 품질을 향상시키는 것입니다. 다른 기능은 사전 학습된 오디오 모델 인코더를 활용하여 원시 오디오 파형에서 직접 특징을 추출하여 보조 판별에 활용합니다. 이 프레임워크는 원시 오디오와 멜 스펙트로그램 모두에서 정상 음향의 특징 표현을 효과적으로 캡처합니다. 또한, 종종 간과되는 미묘하고 국소적인 시간 패턴에 대한 감도를 향상시키기 위해 TMixup 스펙트로그램 증강 기술을 도입했습니다. DCASE 2020 Challenge Task 2 데이터 세트에 대한 광범위한 실험 결과, TLDiffGAN은 우수한 탐지 성능을 보여주었으며, 이상 시간-주파수 위치 추적 능력 또한 뛰어났습니다.

Original Abstract

Existing generative models for unsupervised anomalous sound detection are limited by their inability to fully capture the complex feature distribution of normal sounds, while the potential of powerful diffusion models in this domain remains largely unexplored. To address this challenge, we propose a novel framework, TLDiffGAN, which consists of two complementary branches. One branch incorporates a latent diffusion model into the GAN generator for adversarial training, thereby making the discriminator's task more challenging and improving the quality of generated samples. The other branch leverages pretrained audio model encoders to extract features directly from raw audio waveforms for auxiliary discrimination. This framework effectively captures feature representations of normal sounds from both raw audio and Mel spectrograms. Moreover, we introduce a TMixup spectrogram augmentation technique to enhance sensitivity to subtle and localized temporal patterns that are often overlooked. Extensive experiments on the DCASE 2020 Challenge Task 2 dataset demonstrate the superior detection performance of TLDiffGAN, as well as its strong capability in anomalous time-frequency localization.

0 Citations

0 Influential

0.5 Altmetric

2.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!