Isolated Word Automatic Speech Recognition System

The paper is devoted to an isolated word automatic speech recognition. The first part deals with a theoretical description of methods for speech signal processing and algorithms which can be used for automatic speech recognition such as a dynamic time war

PDF / 610,437 Bytes
13 Pages / 439.37 x 666.142 pts Page_size
20 Downloads / 255 Views

DOWNLOAD

REPORT

. The paper is devoted to an isolated word automatic speech recognition. The ﬁrst part deals with a theoretical description of methods for speech signal processing and algorithms which can be used for automatic speech recognition such as a dynamic time warping, hidden Markov models and deep neural networks. The practical part is focused on the description of the proposal which is based on convolutional neural networks (CNN). The system was designed and implemented in Python using Keras and TensorFlow frameworks. An open audio dataset of spoken words was used for training and testing. A contribution of the paper lies in the speciﬁc proposal using CNN for automatic speech recognition and its validation. The presented results show that the proposed approach is able to achieve 94% accuracy. Keywords: Automatic speech recognition · Machine learning · Hidden Markov models · Dynamic Time Warping · Deep neural networks

1

Introduction

A noticeable trend today is the eﬀort to simplify communication between man and machine. For this purpose, there are automatic speech recognition (ASR) systems that are able to extract information from a speech signal [1]. The speech signal is very complex so that automatic speech recognition is not an easy task. Speech is an acoustic signal with various levels of information (for example phonemes, syllables, words, sentences, etc.). In addition to information content, the speech also transmits clues about the speaker and the environment, which can complicate the decoding of the signal. ASR system can be speaker-dependent or more complicated speaker-independent [2]. These systems can be widely used not only in the personal and industrial ﬁeld but also for military and defense forces, where voice control and speaker identiﬁcation will make the conﬁguration and operation of partial services and applications more eﬃcient. The aim of this work is to design and implement a speaker-independent isolated word automatic speech recognition system. c Springer Nature Switzerland AG 2020 A. Dziech et al. (Eds.): MCSS 2020, CCIS 1284, pp. 252–264, 2020. https://doi.org/10.1007/978-3-030-59000-0_19

Isolated Word Automatic Speech Recognition System

2

253

State of the Art

There are various methods that can be used for ASR. For example, methods that operate on the principle of comparing with the references, statistical methods or Artiﬁcial Neural Network (ANN). In literature [3] authors focused on isolated word recognition in an acoustically balanced, noise-free environment. For feature extraction is used Mel-Frequency Cepstral Coeﬃcients (MFCC) and for a classiﬁcation Dynamic Time Warping (DTW) and K-Nearest Neighbor (KNN) is used. Results are showed in the confusion matrix, the accuracy of this system is 98.4%. DTW algorithm was used in [4,5] too. Another possibility is a method called Vector Quantization (VQ) which is used in [6]. The result of a training phase VQ is a codebook. The experiment in this paper showed that increasing the size of the codebook increase the accuracy of the system. The auth

Data Loading...

Isolated Word Automatic Speech Recognition System

Recommend Documents

Automatic speech recognition: a survey

Automatic Speech Recognition of Galo

Toward Lexicon-Free Bangla Automatic Speech Recognition System

Automatic Prediction of Word Form Reduction in Russian Spontaneous Speech

A Novel Isolated Speech Recognition Method Based on Neural Network

Experiments on Automatic Recognition of Nonnative Arabic Speech

Robust Adaptation to Non-Native Accents in Automatic Speech Recognition

Federated Acoustic Model Optimization for Automatic Speech Recognition

Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition

Holonic Multi-agent System Model for Fuzzy Automatic Speech / Speaker Recognition

Automatic Speech Recognition of Arabic Phonemes with Neural Networks

Automatic Speech Recognition on Mobile Devices and over Communication Networks