2603.24343v2 Mar 25, 2026 cs.SD

신경망 수준의 수정 및 신경 가소성 메커니즘을 통한 딥페이크 오디오 탐지 효율성 및 성능 향상

Enhancing Efficiency and Performance in Deepfake Audio Detection through Neuron-level Dropin & Neuroplasticity Mechanisms

Yupei Li

Citations: 98

h-index: 6

Bjorn W. Schuller

Citations: 101

h-index: 6

Shuaijie Shao

Citations: 18

h-index: 2

M. Milling

Citations: 358

h-index: 11

현재 딥페이크 오디오 탐지는 ResNet과 같은 다양한 딥러닝 아키텍처를 사용하여 놀라운 성능을 달성했으며, Wav2Vec과 같은 대규모 모델(LM)의 도입으로 더욱 개선되었습니다. 대규모 언어 모델(LLM)의 성공은 모델 파라미터 확장의 이점을 보여주지만, 동시에 성능 향상이 파라미터 수에 의해 제한되는 병목 현상을 강조합니다. 현재 LLM에서 사용되는 것처럼 단순히 추가 레이어를 쌓는 방식은 계산 비용이 많이 들고 전체 재학습이 필요합니다. 또한, 기존의 저랭크 적응 방법은 주로 어텐션 기반 아키텍처에 적용되므로 적용 범위가 제한됩니다. 포유류 뇌에서 관찰되는 신경 가소성에서 영감을 받아, 모델 파라미터를 유연하게 조절하기 위해 특정 레이어의 뉴런 수를 동적으로 조정하는 새로운 알고리즘인 '드롭인(dropin)' 및 '가소성(plasticity)'을 제안합니다. 제안하는 알고리즘을 ResNet, 게이티드 순환 신경망(Gated Recurrent Neural Networks) 및 Wav2Vec을 포함한 여러 아키텍처에서 평가했습니다. 널리 사용되는 ASVSpoof2019 LA, PA 및 FakeorReal 데이터 세트를 사용한 실험 결과, '드롭인' 접근 방식을 통해 계산 효율성이 향상되었으며, '드롭인' 및 '가소성' 접근 방식을 모두 사용했을 때 Equal Error Rate(EER)이 해당 데이터 세트에서 최대 39% 및 66% 감소했습니다. 코드 및 추가 자료는 Github 링크에서 확인할 수 있습니다.

Original Abstract

Current audio deepfake detection has achieved remarkable performance using diverse deep learning architectures such as ResNet, and has seen further improvements with the introduction of large models (LMs) like Wav2Vec. The success of large language models (LLMs) further demonstrates the benefits of scaling model parameters, but also highlights one bottleneck where performance gains are constrained by parameter counts. Simply stacking additional layers, as done in current LLMs, is computationally expensive and requires full retraining. Furthermore, existing low-rank adaptation methods are primarily applied to attention-based architectures, which limits their scope. Inspired by the neuronal plasticity observed in mammalian brains, we propose novel algorithms, dropin and further plasticity, that dynamically adjust the number of neurons in certain layers to flexibly modulate model parameters. We evaluate these algorithms on multiple architectures, including ResNet, Gated Recurrent Neural Networks, and Wav2Vec. Experimental results using the widely recognised ASVSpoof2019 LA, PA, and FakeorReal dataset demonstrate consistent improvements in computational efficiency with the dropin approach and a maximum of around 39% and 66% relative reduction in Equal Error Rate with the dropin and plasticity approach among these dataset, respectively. The code and supplementary material are available at Github link.

0 Citations

0 Influential

5.5 Altmetric

27.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!