Optimal trained artificial neural network for Telugu speaker diarization

  • PDF / 9,670,904 Bytes
  • 18 Pages / 595.276 x 790.866 pts Page_size
  • 7 Downloads / 235 Views

DOWNLOAD

REPORT


RESEARCH PAPER

Optimal trained artificial neural network for Telugu speaker diarization V. Sethuram1 · Ande Prasad1 · R. Rajeshwara Rao2 Received: 25 September 2019 / Revised: 5 February 2020 / Accepted: 23 February 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Speaker indexing or diarization is the process of automatically partitioning the conversation involving multiple speakers into homogeneous segments and grouping together all the segments that correspond to the same speaker. So far, certain works have been done under this aspect; still, the need of accurate partitioning process gets lagged under certain criteria. With this in mind, this paper aims to introduce a new speaker indexing or diarization model (Telugu language) that initially involves Mel Frequency Cepstral coefficient based feature extraction. Subsequently, a new Optimized Artificial Neural Network (ANN) is introduced for clustering process. The novelty behind the clustering process is: the training of ANN takes place through optimization logic that updates the weight of ANN by a hybrid concept of Artificial Bee Colony (ABC) and Lion Algorithm (LA). Thereby, the proposed model is named as ANN-ABC-LA model. Finally, the performance of the proposed ANN-ABC-LA model is compared over the state-of-the-art models with respect to different performance measures. Keywords  Speaker diarization · Feature extraction · Neural network · Lion algorithm · Artificial Bee Colony

1 Introduction With the innovation of smart technologies in the field of engineering, a lot of intelligent and efficient methodologies were emerged to enhance the standard of human life. When it comes to human machine interaction, there is an ever increasing demand to develop automated human language recognition models that enables proficient and intellectual ways to communication [1]. For this purpose, a capable exploring, indexing and retrieving techniques are necessitated for audio signal interaction. In addition, extortion of the speech signals recorded using speech recognition system gives a deep base for the chores yet, the words are obviously complex to read and cover each data involves in the audio signal [2, 3]. All these difficulties urge the introduction of audio diarization technique. Generally, speaker diarization can be defined as the method of interpreting an input audio signal into the data which overlaps temporal areas of signal

* V. Sethuram [email protected] 1



Vikrama simhapuri University, Nellore, Andhra Pradesh, India



JNTU, Vizayanagaram, Andhra Pradesh, India

2

energy with its particular sources [4, 5]. Moreover, it divides the audio signal into homogenous segments with respect to the audio recognition. The different sources of audio input can have music, speakers, signals, background noises, diverse channel properties, etc. [6]. In addition, diarization is utilized in assisting speech recognition, promoting audio search facilities, and audio archives indexing, and further maximizing the quality of automated dictations as