






















Neural networks can be trained to learn task-relevant representations from data. Understanding how these networks make decisions falls within the Explainable AI (XAI) domain. This paper proposes to study an XAI topic: uncovering the unknown organisation in the representations, particularly those a speaker recognition network learns from utterances, for recognising speaker identity. Past studies have employed algorithms (e.g. K-means) to analyse how network representations can be naturally organised into independent clusters in different ways, i.e., to analyse flat clustering phenomena within the space defined by these representations, referred to as the network representation space. In contrast, this work applies two algorithms, Single-Linkage Clustering (SLINK) and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), to analyse how representations form hierarchical clusters in different ways, i.e., to analyse hierarchical clustering phenomena within the network representation space. To further understand these hierarchical clustering phenomena, we propose a new algorithm termed Hierarchical Cluster-Class Matching (HCCM). HCCM provides a semantic interpretation for the hierarchical clusters produced by SLINK and HDBSCAN by matching them to predefined semantic classes. Through this process, some clusters are interpreted as individual semantic classes (e.g. male), whereas others are interpreted as conjunctions of individual semantic classes (e.g. female and Ireland). In addition, we develop a new metric, the Liebig score, to quantify how well a cluster matches a semantic class, which helps identify the factor that most strongly limits each match.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。