2604.23354v1 Apr 25, 2026 eess.AS

화자 인식에서의 설명 가능한 인공지능: 잠재된 표현을 이해하기

Explainable AI in Speaker Recognition -- Making Latent Representations Understandable

Mark D. Plumbley

Citations: 61

h-index: 4

Yanze Xu

Citations: 15

h-index: 1

Wenwu Wang

Citations: 793

h-index: 17

신경망은 데이터를 통해 작업과 관련된 표현을 학습하도록 훈련될 수 있습니다. 이러한 신경망이 어떻게 의사 결정을 내리는지 이해하는 것은 설명 가능한 인공지능(XAI) 분야에 속합니다. 본 논문에서는 XAI의 한 분야인 화자 인식 네트워크가 학습하는 표현 내의 알려지지 않은 조직 패턴을 밝히는 것을 연구합니다. 기존 연구에서는 알고리즘(예: t-분포 확률적 이웃 임베딩 및 K-평균)을 사용하여 네트워크 표현이 독립적인 클러스터로 어떻게 형성되는지 분석하고 시각화했으며, 이는 이러한 표현으로 정의된 공간 내에서 평탄한 클러스터링 현상이 존재함을 나타냅니다. 이에 반해, 본 연구에서는 두 가지 알고리즘, 즉 Single-Linkage Clustering (SLINK) 및 Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN)를 적용하여 표현이 독립적인 상태가 아닌 계층적 관계를 갖는 클러스터로 형성되는 방식을 분석합니다. 이를 통해 네트워크 표현 공간 내에서 계층적 클러스터링 현상이 존재함을 보여줍니다. 위에서 설명한 계층적 클러스터링 현상을 의미적으로 이해하기 위해, 미리 정의된 의미론적 클래스와 계층적 표현 클러스터(즉, SLINK 또는 HDBSCAN에 의해 생성된 클러스터) 간의 일대일 매칭을 수행하는 새로운 알고리즘인 Hierarchical Cluster-Class Matching (HCCM)을 설계했습니다. 일부 계층적 클러스터는 개별적인 의미론적 클래스(예: 남성, 영국)와 성공적으로 매칭되었으며, 다른 일부는 여러 의미론적 클래스의 조합(예: 남성 및 영국, 여성 및 아일랜드)과 매칭되었습니다. 각 매칭 동작의 성능을 정량화하기 위한 새로운 지표인 Liebig's score를 제안하여, 매칭 성능을 가장 크게 제한하는 요인을 진단할 수 있도록 했습니다.

Original Abstract

Neural networks can be trained to learn task-relevant representations from data. Understanding how these networks make decisions falls within the Explainable AI (XAI) domain. This paper proposes to study an XAI topic: uncovering unknown organisational patterns in network representations, particularly those representations learned by the speaker recognition network that recognises the speaker identity of utterances. Past studies employed algorithms (e.g. t-distributed Stochastic Neighbour Embedding and K-means) to analyse and visualise how network representations form independent clusters, indicating the presence of flat clustering phenomena within the space defined by these representations. In contrast, this work applies two algorithms -- Single-Linkage Clustering (SLINK) and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) -- to analyse how representations form clusters with hierarchical relationships rather than being independent, thereby demonstrating the existence of hierarchical clustering phenomena within the network representation space. To semantically understand the above hierarchical clustering phenomena, a new algorithm, termed Hierarchical Cluster-Class Matching (HCCM), is designed to perform one-to-one matching between predefined semantic classes and hierarchical representation clusters (i.e. those produced by SLINK or HDBSCAN). Some hierarchical clusters are successfully matched to individual semantic classes (e.g. male, UK), while others to conjunctions of semantic classes (e.g. male and UK, female and Ireland). A new metric, Liebig's score, is proposed to quantify the performance of each matching behaviour, allowing us to diagnose the factor that most strongly limits matching performance.

0 Citations

0 Influential

8.5 Altmetric

42.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!