Isolated Word Automatic Speech Recognition System

The paper is devoted to an isolated word automatic speech recognition. The first part deals with a theoretical description of methods for speech signal processing and algorithms which can be used for automatic speech recognition such as a dynamic time war

  • PDF / 610,437 Bytes
  • 13 Pages / 439.37 x 666.142 pts Page_size
  • 20 Downloads / 241 Views

DOWNLOAD

REPORT


. The paper is devoted to an isolated word automatic speech recognition. The first part deals with a theoretical description of methods for speech signal processing and algorithms which can be used for automatic speech recognition such as a dynamic time warping, hidden Markov models and deep neural networks. The practical part is focused on the description of the proposal which is based on convolutional neural networks (CNN). The system was designed and implemented in Python using Keras and TensorFlow frameworks. An open audio dataset of spoken words was used for training and testing. A contribution of the paper lies in the specific proposal using CNN for automatic speech recognition and its validation. The presented results show that the proposed approach is able to achieve 94% accuracy. Keywords: Automatic speech recognition · Machine learning · Hidden Markov models · Dynamic Time Warping · Deep neural networks

1

Introduction

A noticeable trend today is the effort to simplify communication between man and machine. For this purpose, there are automatic speech recognition (ASR) systems that are able to extract information from a speech signal [1]. The speech signal is very complex so that automatic speech recognition is not an easy task. Speech is an acoustic signal with various levels of information (for example phonemes, syllables, words, sentences, etc.). In addition to information content, the speech also transmits clues about the speaker and the environment, which can complicate the decoding of the signal. ASR system can be speaker-dependent or more complicated speaker-independent [2]. These systems can be widely used not only in the personal and industrial field but also for military and defense forces, where voice control and speaker identification will make the configuration and operation of partial services and applications more efficient. The aim of this work is to design and implement a speaker-independent isolated word automatic speech recognition system. c Springer Nature Switzerland AG 2020  A. Dziech et al. (Eds.): MCSS 2020, CCIS 1284, pp. 252–264, 2020. https://doi.org/10.1007/978-3-030-59000-0_19

Isolated Word Automatic Speech Recognition System

2

253

State of the Art

There are various methods that can be used for ASR. For example, methods that operate on the principle of comparing with the references, statistical methods or Artificial Neural Network (ANN). In literature [3] authors focused on isolated word recognition in an acoustically balanced, noise-free environment. For feature extraction is used Mel-Frequency Cepstral Coefficients (MFCC) and for a classification Dynamic Time Warping (DTW) and K-Nearest Neighbor (KNN) is used. Results are showed in the confusion matrix, the accuracy of this system is 98.4%. DTW algorithm was used in [4,5] too. Another possibility is a method called Vector Quantization (VQ) which is used in [6]. The result of a training phase VQ is a codebook. The experiment in this paper showed that increasing the size of the codebook increase the accuracy of the system. The auth