2603.27557v1 Mar 29, 2026 cs.SD

딥페이크 음성 탐지를 위한 일반 모델: 다양한 신뢰성 있는 자료 또는 다양한 AI 기반 생성기

A General Model for Deepfake Speech Detection: Diverse Bonafide Resources or Diverse AI-Based Generators

L. Pham

Citations: 0

h-index: 0

Khoi M. Vu

Citations: 0

h-index: 0

Dat Tran

Citations: 9

h-index: 2

David Fischinger

Citations: 15

h-index: 3

M. Hasenbalg

Citations: 116

h-index: 2

Davide Antonutti

Citations: 4

h-index: 1

Alexander Schindler

Citations: 86

h-index: 4

Martin Boyer

Citations: 15

h-index: 3

Ian McLoughlin

Citations: 198

h-index: 8

Simon Freitter

Citations: 0

h-index: 0

본 논문에서는 딥페이크 음성 탐지(DSD) 모델의 성능과 일반화에 영향을 미치는 주요 요인인 신뢰성 있는 자료(BR)와 AI 기반 생성기(AG)를 분석합니다. 이를 위해, 먼저 기반 모델로 사용할 딥러닝 기반 모델을 제안합니다. 그런 다음, 기반 모델에 대한 실험을 수행하여 신뢰성 있는 자료(BR)와 AI 기반 생성기(AG) 요인이 추론 과정에서 위조 또는 신뢰성 있는 입력 오디오를 탐지하는 데 사용되는 임계값 점수에 미치는 영향을 분석합니다. 실험 결과를 바탕으로, 공개된 딥페이크 음성 탐지(DSD) 데이터 세트를 재사용하고 신뢰성 있는 자료(BR)와 AI 기반 생성기(AG) 간의 균형을 맞춘 데이터 세트를 제안합니다. 제안된 데이터 세트를 사용하여 다양한 딥러닝 기반 모델을 학습하고, 다양한 벤치마크 데이터 세트에 대한 교차 데이터 세트 평가를 수행합니다. 교차 데이터 세트 평가 결과는 신뢰성 있는 자료(BR)와 AI 기반 생성기(AG)의 균형이 일반적인 딥페이크 음성 탐지(DSD) 모델을 학습하고 달성하는 데 중요한 요소임을 입증합니다.

Original Abstract

In this paper, we analyze two main factors of Bonafide Resource (BR) or AI-based Generator (AG) which affect the performance and the generality of a Deepfake Speech Detection (DSD) model. To this end, we first propose a deep-learning based model, referred to as the baseline. Then, we conducted experiments on the baseline by which we indicate how Bonafide Resource (BR) and AI-based Generator (AG) factors affect the threshold score used to detect fake or bonafide input audio in the inference process. Given the experimental results, a dataset, which re-uses public Deepfake Speech Detection (DSD) datasets and shows a balance between Bonafide Resource (BR) or AI-based Generator (AG), is proposed. We then train various deep-learning based models on the proposed dataset and conduct cross-dataset evaluation on different benchmark datasets. The cross-dataset evaluation results prove that the balance of Bonafide Resources (BR) and AI-based Generators (AG) is the key factor to train and achieve a general Deepfake Speech Detection (DSD) model.

0 Citations

0 Influential

4 Altmetric

20.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!