Speech Production Model
The continuous speech signal (air) that comes out of the mouth and the nose is converted into the electrical signal using the microphone. The electrical speech signal thus obtained is sampled to obtain the discrete signals and are stored in the digital sy
- PDF / 711,949 Bytes
- 20 Pages / 439.37 x 666.142 pts Page_size
- 114 Downloads / 183 Views
Speech Production Model
Abstract The continuous speech signal (air) that comes out of the mouth and the nose is converted into the electrical signal using the microphone. The electrical speech signal thus obtained is sampled to obtain the discrete signals and are stored in the digital system for further processing. This is digital speech processing. The speech signal model is broadly classified as the source-filter model and the probabilistic model. Source-filter model assumes the physical phenomenon for the production of speech signal. Probabilistic model like Hidden Markov Model (HMM), Gaussian Mixture Model (GMM) are the mathematical model that does not care about the physical phenomenon. Speech model is used to extract the feature vectors from the speech signal for isolated speech recognition and the speaker recognition. It is used to compress the speech signal for storage like in Code exited linear prediction (CELP). It is useful for converting text into speech, known as speech synthesis. It is also used for continuous speech recognition. This chapter deals with the source-filter model of speech production.
2.1 Introduction The air that comes out of the lungs passes through the vocal tract and comes out of the mouth and the nose to obtain the continuous speech signal. The air coming out of lungs are either sent directly to the vocal tractor or altered using the vocal chord vibrations before sending to the vocal tract. The speech signals with vocal chord vibrations are known as voiced speech signals. The speech signals without the vocal chord vibrations are known as unvoiced speech signals. The velum is used to close the nose path,so that the speech signal is coming out only through the mouth. The vocal tract path is adjusted using tongue and velum to produce different speech signal. Thus lung, vocal chord, vocal tract, tongue, velum, mouth and nose are the integral part that produces the speech signal (refer Appendix F).
E. S. Gopi, Digital Speech Processing Using Matlab, Signals and Communication Technology, DOI: 10.1007/978-81-322-1677-3_2, © Springer India 2014
73
74
2 Speech Production Model
Fig. 2.1 Source-filter model of the speech production
2.2 1-D Sound Waves The sound waves are longitudinal waves. It produces the disturbance along the direction of the flow (refer Fig. 2.1). The disturbance is in the form of compression and rarefaction. In source-filter model, the source is either the noise (air from the lungs) or the impulse stream (vocal chord vibration with the particular frequency) and the filter is the vocal-tract. The filter is assumed as the cascade connections of the tubes with different cross-sectional area. The length of the tube is usually less than the wavelength of the produced sound wave. Hence speech–sound waves are assumed to travel in one-dimensional direction. This model is known as 1-D sound wave.
2.2.1 Physics on Sound Wave Travelling Through the Tube with Uniform Cross-Sectional Area A Consider the small segment of the tube (shaded region). When the sound wave crosses th
Data Loading...