Real-Time, Non-intrusive Speech Quality Estimation: A Signal-Based Model

Speech quality estimation, as perceived by humans, is of vital importance to proper functioning of telecommunications networks. Speech quality can be degraded due to various network related problems. In this paper we present a model for speech quality est

PDF / 375,053 Bytes
12 Pages / 430 x 660 pts Page_size
95 Downloads / 300 Views

DOWNLOAD

REPORT

Abstract. Speech quality estimation, as perceived by humans, is of vital importance to proper functioning of telecommunications networks. Speech quality can be degraded due to various network related problems. In this paper we present a model for speech quality estimation that is a function of various time and frequency domain features of human speech. We have employed a hybrid optimization approach, by using Genetic Programming (GP) to ﬁnd a suitable structure for the desired model. In order to optimize the coeﬃcients of the model we have employed a traditional GA and a numerical method known as linear scaling. The proposed model outperforms the ITU-T Recommendation P.563 in terms of prediction accuracy, which is the current non-intrusive speech quality estimation model. The proposed model also has a signiﬁcantly reduced dimensionality. This may reduce the computational requirements of the model. Keywords: Non-Intrusive, Signal-based, GP, MOS.

1

Introduction

Speech quality may be reduced due to various reasons in a telecommunications network. Some of these may be the noisy/faulty channels and links, frame loss due to irrecoverable errors and low bitrate coding. Speech quality estimation is vital to the evaluation of quality of service oﬀered by a telecommunications network. Traditionally, speech quality is estimated using subjective tests. In subjective tests, the quality of a speech signal under test is evaluated by a group of human listeners who assign an opinion score on an integral scale ranging between 1 (bad) to 5 (excellent). The average of these scores, termed the Mean Opinion Score (MOS), is considered as the ultimate determinant of the speech quality [1]. Subjective tests are, however, time consuming and expensive. To make up for these limitations, there has been a growing interest in devising software based objective assessment models. There are two kinds of objective assessment models, namely, intrusive and non-intrusive. Intrusive models evaluate the quality of a distorted speech signal in the presence of a corresponding reference signal. The current International Telecommunications Union (ITU-T) M. O’Neill et al. (Eds.): EuroGP 2008, LNCS 4971, pp. 37–48, 2008. c Springer-Verlag Berlin Heidelberg 2008

38

A. Raja and C. Flanagan

recommendation P.862 (PESQ) [2] is an example of such an approach. Nonintrusive models, on the other hand, do not enjoy this privilege and base their results solely on the estimated features of the signal under test. For this reason, the results of the latter type of models are generally considered inferior to those of the former. Non-intrusive models can be classiﬁed either as signal-based or parametric. As the name suggests, signal-based models are based on the digital signal processing of human speech. An example of such a model is the current, state-of-the-art, ITU-T Recommendation P.563 for single-ended estimation of speech quality [3]. Parametric models, on the other hand, base their results on various properties relevant to the telecommunications network. In th

Data Loading...

Real-Time, Non-intrusive Speech Quality Estimation: A Signal-Based Model

Recommend Documents

Low-complexity disordered speech quality estimation

Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model

E-Model Parameters Estimation for VoIP with Non-ITU Codec Speech Quality Prediction

Correction to: Low-complexity disordered speech quality estimation

A Parametric Tongue Model for Animated Speech

Non-intrusive speech quality prediction based on the blind estimation of clean speech and the i-vector framework

Speech Production Model

Hammerstein Model for Speech Coding

A novel BNMF-DNN based speech reconstruction method for speech quality evaluation under complex environments

Anthropomorphic Coding of Speech and Audio: A Model Inversion Approach

A Machine Learning Model to Detect Speech and Reading Pathologies

Speech Prosody in Speech Synthesis: Modeling and generation of prosody for high quality and flexible speech synthesis