A deep learning approach to integrate convolutional neural networks in speaker recognition

PDF / 925,878 Bytes
9 Pages / 595.276 x 790.866 pts Page_size
103 Downloads / 212 Views

A deep learning approach to integrate convolutional neural networks in speaker recognition Soufiane Hourri1 · Nikola S. Nikolov2 · Jamal Kharroubi1 Received: 6 February 2020 / Accepted: 12 May 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract We propose a novel usage of convolutional neural networks (CNNs) for the problem of speaker recognition. While being particularly designed for computer vision problems, CNNs have recently been applied for speaker recognition by using spectrograms as input images. We believe that this approach is not optimal as it may result in two cumulative errors in solving both a computer vision and a speaker recognition problem. In this work, we aim at integrating CNNs in speaker recognition without relying on images. We use Restricted Boltzmann Machines (RBMs) to extract speakers models as matrices and introduce a new way to model target and non-target speakers, in order to perform speaker verification. Thus, we use a CNN to discriminate between target and non-target matrices. Experiments were conducted with the THUYG-20 SRE corpus under three noise conditions: clean, 9 db, and 0 db. The results demonstrate that our method outperforms the state-of-the-art approaches by decreasing the error rate by up to 60%. Keywords Speaker recognition · MFCC · Convolutional neural network · Restricted Boltzmann Machine · Deep learning

1 Introduction Nowadays, speaker verification is gaining high interest in the field of speaker recognition, due to the high demand for voice access and security applications (Zhang et al. 2017; Hanilçi 2018). Speaker recognition can be divided into speaker identification and speaker verification. In speaker identification, the voice of a person to be identified is compared to a set of known speakers and classified as one of them. In speaker verification, the voice of a speaker is either accepted or rejected as the voice of a particular person. A speaker verification system may be either text-dependent or text-independent. A text-dependent system proposes to speakers to read either a fixed phrase or randomly prompted words correctly. Then it measures two similarities; first, the one between the spoken and the proposed words, and second, the similarity between the voice of the claimed person and the speaker. Whereas, text-independent systems * Soufiane Hourri [email protected] 1

Faculté des Sciences et Techniques, Laboratoire des Systèmes Intelligents et Applications, Université Sidi Mohamed Ben Abdellah, Fez, Morocco

University of Limerick, Limerick, Ireland

2

are concerned with the speaker voice’s characteristics only. In this work, we build a text-independent speaker verification system, using the THUYG SRE 20 corpus (Rozi et al. 2015). We follow a process that begins with extracting features from a speech signal. In the context of speaker recognition, feature extraction aims at obtaining a compact representation from a raw acoustic signal (Sadjadi and Hansen 2015) in the form of a sequence of feature vectors (Redd

Data Loading...

A deep learning approach to integrate convolutional neural networks in speaker recognition

Recommend Documents

Fruit Classification Through Deep Learning: A Convolutional Neural Network Approach

Evolutionary Approach to Machine Learning and Deep Neural Networks N

Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks

Advanced Applied Deep Learning Convolutional Neural Networks and Ob

Sensible Autonomous Machine Using Deep Learning and Convolutional Neural Networks

Deep Learning and Convolutional Neural Networks for Medical Image Computing

Biomedical Text Recognition Using Convolutional Neural Networks: Content Based Deep Learning

Convolutional Neural Networks for Traffic Signs Recognition

An efficient method for human hand gesture detection and recognition using deep learning convolutional neural networks

Speaker Identification Using Entropygrams and Convolutional Neural Networks

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

Deep Neural Networks: Incremental Learning