Optimal trained artificial neural network for Telugu speaker diarization

PDF / 9,670,904 Bytes
18 Pages / 595.276 x 790.866 pts Page_size
7 Downloads / 376 Views

RESEARCH PAPER

Optimal trained artificial neural network for Telugu speaker diarization V. Sethuram1 · Ande Prasad1 · R. Rajeshwara Rao2 Received: 25 September 2019 / Revised: 5 February 2020 / Accepted: 23 February 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Speaker indexing or diarization is the process of automatically partitioning the conversation involving multiple speakers into homogeneous segments and grouping together all the segments that correspond to the same speaker. So far, certain works have been done under this aspect; still, the need of accurate partitioning process gets lagged under certain criteria. With this in mind, this paper aims to introduce a new speaker indexing or diarization model (Telugu language) that initially involves Mel Frequency Cepstral coefficient based feature extraction. Subsequently, a new Optimized Artificial Neural Network (ANN) is introduced for clustering process. The novelty behind the clustering process is: the training of ANN takes place through optimization logic that updates the weight of ANN by a hybrid concept of Artificial Bee Colony (ABC) and Lion Algorithm (LA). Thereby, the proposed model is named as ANN-ABC-LA model. Finally, the performance of the proposed ANN-ABC-LA model is compared over the state-of-the-art models with respect to different performance measures. Keywords Speaker diarization · Feature extraction · Neural network · Lion algorithm · Artificial Bee Colony

1 Introduction With the innovation of smart technologies in the field of engineering, a lot of intelligent and efficient methodologies were emerged to enhance the standard of human life. When it comes to human machine interaction, there is an ever increasing demand to develop automated human language recognition models that enables proficient and intellectual ways to communication [1]. For this purpose, a capable exploring, indexing and retrieving techniques are necessitated for audio signal interaction. In addition, extortion of the speech signals recorded using speech recognition system gives a deep base for the chores yet, the words are obviously complex to read and cover each data involves in the audio signal [2, 3]. All these difficulties urge the introduction of audio diarization technique. Generally, speaker diarization can be defined as the method of interpreting an input audio signal into the data which overlaps temporal areas of signal

* V. Sethuram [email protected] 1

Vikrama simhapuri University, Nellore, Andhra Pradesh, India

JNTU, Vizayanagaram, Andhra Pradesh, India

2

energy with its particular sources [4, 5]. Moreover, it divides the audio signal into homogenous segments with respect to the audio recognition. The different sources of audio input can have music, speakers, signals, background noises, diverse channel properties, etc. [6]. In addition, diarization is utilized in assisting speech recognition, promoting audio search facilities, and audio archives indexing, and further maximizing the quality of automated dictations as

Data Loading...

Optimal trained artificial neural network for Telugu speaker diarization

Recommend Documents

Speaker Diarization

Musclesense: a Trained, Artificial Neural Network for the Anatomical Segmentation of Lower Limb Magnetic Resonance Image

Artificial Neural Network

Artificial Neural Network

Artificial Neural Network Modelling

Pre-trained Convolutional Neural Network for the Diagnosis of Tuberculosis

Evolving Artificial Neural Network Ensembles

Extracting Classification Rules from Artificial Neural Network Trained with Discretized Inputs

SOMDROID: android malware detection by artificial neural network trained using unsupervised learning

Artificial Neural Network: Gas recognition

Exponential Discretization of Weights of Neural Network Connections in Pre-Trained Neural Network. Part II: Correlation

Neural Network Subgraphs Correlation with Trained Model Accuracy