Real-Time, Non-intrusive Speech Quality Estimation: A Signal-Based Model
Speech quality estimation, as perceived by humans, is of vital importance to proper functioning of telecommunications networks. Speech quality can be degraded due to various network related problems. In this paper we present a model for speech quality est
- PDF / 375,053 Bytes
- 12 Pages / 430 x 660 pts Page_size
- 95 Downloads / 199 Views
Abstract. Speech quality estimation, as perceived by humans, is of vital importance to proper functioning of telecommunications networks. Speech quality can be degraded due to various network related problems. In this paper we present a model for speech quality estimation that is a function of various time and frequency domain features of human speech. We have employed a hybrid optimization approach, by using Genetic Programming (GP) to find a suitable structure for the desired model. In order to optimize the coefficients of the model we have employed a traditional GA and a numerical method known as linear scaling. The proposed model outperforms the ITU-T Recommendation P.563 in terms of prediction accuracy, which is the current non-intrusive speech quality estimation model. The proposed model also has a significantly reduced dimensionality. This may reduce the computational requirements of the model. Keywords: Non-Intrusive, Signal-based, GP, MOS.
1
Introduction
Speech quality may be reduced due to various reasons in a telecommunications network. Some of these may be the noisy/faulty channels and links, frame loss due to irrecoverable errors and low bitrate coding. Speech quality estimation is vital to the evaluation of quality of service offered by a telecommunications network. Traditionally, speech quality is estimated using subjective tests. In subjective tests, the quality of a speech signal under test is evaluated by a group of human listeners who assign an opinion score on an integral scale ranging between 1 (bad) to 5 (excellent). The average of these scores, termed the Mean Opinion Score (MOS), is considered as the ultimate determinant of the speech quality [1]. Subjective tests are, however, time consuming and expensive. To make up for these limitations, there has been a growing interest in devising software based objective assessment models. There are two kinds of objective assessment models, namely, intrusive and non-intrusive. Intrusive models evaluate the quality of a distorted speech signal in the presence of a corresponding reference signal. The current International Telecommunications Union (ITU-T) M. O’Neill et al. (Eds.): EuroGP 2008, LNCS 4971, pp. 37–48, 2008. c Springer-Verlag Berlin Heidelberg 2008
38
A. Raja and C. Flanagan
recommendation P.862 (PESQ) [2] is an example of such an approach. Nonintrusive models, on the other hand, do not enjoy this privilege and base their results solely on the estimated features of the signal under test. For this reason, the results of the latter type of models are generally considered inferior to those of the former. Non-intrusive models can be classified either as signal-based or parametric. As the name suggests, signal-based models are based on the digital signal processing of human speech. An example of such a model is the current, state-of-the-art, ITU-T Recommendation P.563 for single-ended estimation of speech quality [3]. Parametric models, on the other hand, base their results on various properties relevant to the telecommunications network. In th
Data Loading...