ELM speaker identification for limited dataset using multitaper based MFCC and PNCC features with fusion score
- PDF / 1,621,823 Bytes
- 25 Pages / 439.37 x 666.142 pts Page_size
- 67 Downloads / 189 Views
ELM speaker identification for limited dataset using multitaper based MFCC and PNCC features with fusion score Bharath K P 1 & Rajesh Kumar M 1 Received: 14 October 2019 / Revised: 4 July 2020 / Accepted: 13 July 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract
In current scenario, speaker recognition under noisy condition is the major challenging task in the area of speech processing. Due to noise environment there is a significant degradation in the system performance. The major aim of the proposed work is to identify the speaker’s under clean and noise background using limited dataset. In this paper, we proposed a multitaper based Mel frequency cepstral coefficients (MFCC) and power normalization cepstral coefficients (PNCC) techniques with fusion strategies. Here, we used MFCC and PNCC techniques with different multitapers to extract the desired features from the obtained speech samples. Then, cepstral mean and variance normalization (CMVN) and Feature warping (FW) are the two techniques applied to normalize the obtained features from both the techniques. Furthermore, as a system model low dimension i-vector model is used and also different fusion score strategies like mean, maximum, weighted sum, cumulative and concatenated fusion techniques are utilized. Finally extreme learning machine (ELM) is used for classification in order to increase the system identification accuracy (SIA) intern which is having a single layer feedforward neural network with less complexity and time consuming compared to other neural networks. TIMIT and SITW 2016 are the two different databases are used to evaluate the proposed system under limited data of these databases. Both clean and noisy backgrounds conditions are used to check the SIA. Keywords Multitaper . MFCC . PNCC . Frequency warping . CMVN
* Rajesh Kumar M [email protected] Bharath K P [email protected]
1
School of Electronics Engineering, Vellore Institute of Technology, Vellore, India
Multimedia Tools and Applications
1 Introduction In present, the various applications of speech and speaker recognition gives best results under perfect laboratory setting, when it comes under different noise environments and different transmission channels there is significant degradation in the system performance. Automatic speaker recognition (ASR) is the process of recognizing the users by system using the desired information obtained from their speech sample [5]. Based upon the given task objective the automatic speaker recognition is classified into verification and identification [5]. Speaker recognition is the process of identifying and verifying the user from the spoken utterance. Gaussian mixture model (GMM), vector quantization (VQ) method, support vector machine (SVM) and joint analysis with i-vector model and traditional probabilistic linear discriminant analysis (PLDA) are represented in the speech technology [23]. As explained in [49] it shows that the speech samples feature for speaker recognition should be less prone to noi
Data Loading...