Speech Production Model

The continuous speech signal (air) that comes out of the mouth and the nose is converted into the electrical signal using the microphone. The electrical speech signal thus obtained is sampled to obtain the discrete signals and are stored in the digital sy

PDF / 711,949 Bytes
20 Pages / 439.37 x 666.142 pts Page_size
114 Downloads / 334 Views

DOWNLOAD

REPORT

Speech Production Model

Abstract The continuous speech signal (air) that comes out of the mouth and the nose is converted into the electrical signal using the microphone. The electrical speech signal thus obtained is sampled to obtain the discrete signals and are stored in the digital system for further processing. This is digital speech processing. The speech signal model is broadly classified as the source-filter model and the probabilistic model. Source-filter model assumes the physical phenomenon for the production of speech signal. Probabilistic model like Hidden Markov Model (HMM), Gaussian Mixture Model (GMM) are the mathematical model that does not care about the physical phenomenon. Speech model is used to extract the feature vectors from the speech signal for isolated speech recognition and the speaker recognition. It is used to compress the speech signal for storage like in Code exited linear prediction (CELP). It is useful for converting text into speech, known as speech synthesis. It is also used for continuous speech recognition. This chapter deals with the source-filter model of speech production.

2.1 Introduction The air that comes out of the lungs passes through the vocal tract and comes out of the mouth and the nose to obtain the continuous speech signal. The air coming out of lungs are either sent directly to the vocal tractor or altered using the vocal chord vibrations before sending to the vocal tract. The speech signals with vocal chord vibrations are known as voiced speech signals. The speech signals without the vocal chord vibrations are known as unvoiced speech signals. The velum is used to close the nose path,so that the speech signal is coming out only through the mouth. The vocal tract path is adjusted using tongue and velum to produce different speech signal. Thus lung, vocal chord, vocal tract, tongue, velum, mouth and nose are the integral part that produces the speech signal (refer Appendix F).

E. S. Gopi, Digital Speech Processing Using Matlab, Signals and Communication Technology, DOI: 10.1007/978-81-322-1677-3_2, © Springer India 2014

73

74

2 Speech Production Model

Fig. 2.1 Source-filter model of the speech production

2.2 1-D Sound Waves The sound waves are longitudinal waves. It produces the disturbance along the direction of the flow (refer Fig. 2.1). The disturbance is in the form of compression and rarefaction. In source-filter model, the source is either the noise (air from the lungs) or the impulse stream (vocal chord vibration with the particular frequency) and the filter is the vocal-tract. The filter is assumed as the cascade connections of the tubes with different cross-sectional area. The length of the tube is usually less than the wavelength of the produced sound wave. Hence speech–sound waves are assumed to travel in one-dimensional direction. This model is known as 1-D sound wave.

2.2.1 Physics on Sound Wave Travelling Through the Tube with Uniform Cross-Sectional Area A Consider the small segment of the tube (shaded region). When the sound wave crosses th

Data Loading...

Speech Production Model

Recommend Documents

Speech Production and Speech Modelling

Speech Perception, Production and Acquisition Multidisciplinary appr

Hammerstein Model for Speech Coding

A Parametric Tongue Model for Animated Speech

HMMS PRODUCTION PLANNING MODEL

Studies on Speech Production 11th International Seminar, ISSP 2017,

Dereverberation by Using Time-Variant Nature of Speech Production System

Speech-to-Speech Translation

Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model

Model-Based Synthesis of Visual Speech Movements from 3D Video

Introduction of Semantic Model to Help Speech Recognition

Speech