On the use of the i-vector speech representation for instrumental quality measurement
- PDF / 1,294,563 Bytes
- 14 Pages / 595.276 x 790.866 pts Page_size
- 17 Downloads / 155 Views
RESEARCH ARTICLE
On the use of the i‑vector speech representation for instrumental quality measurement Anderson R. Avila1,2 · Jahangir Alam2 · Douglas O’Shaughnessy1 · Tiago H. Falk1 Received: 7 November 2019 © Springer Nature Switzerland AG 2020
Abstract The i-vector framework has been widely used to summarize speaker-dependent information present in a speech signal. Considered the state-of-the-art in speaker verification for many years, its potential to estimate speech recording distortion/ quality has been overlooked. This paper is an attempt to fill this gap. We conduct a detailed analysis of how distortions are captured in the total variability space. We then propose a full-reference speech quality model based on i-vector similarities and three no-reference approaches. The first no-reference approach makes use of a single reference i-vector based on the average of i-vectors extracted from clean signals. A second approach relies on a vector quantizer codebook of representative clean speech i-vectors. Lastly, i-vectors and subjective ratings were used to train a no-reference deep neural network model for speech quality assessment. Four experiments have shown that the proposed methods, based on the i-vector speech representation, are well-suited for assessing speech quality. Results show correlations with subjective quality judgments similar to those achieved with standardized instrumental algorithms, particularly for degradations caused by noise and reverberation.ϖ Keywords Speech quality assessment · Instrumental quality measurement · I-vector · Speech enhancement
Introduction Estimating the perceived quality of existing and emerging multimedia services and applications is important, especially for providers seeking to optimize their services and maximize customer experience [1]. Real-time quality monitoring, for example, can help with network design and development, as well as with online adaptation to assure that the end users’ expectations are met. As new services and technologies emerge, quality monitoring tools need to be able to characterize new artifacts and distortions that may arise. * Anderson R. Avila [email protected] Jahangir Alam [email protected] Douglas O’Shaughnessy [email protected] Tiago H. Falk [email protected] 1
Institut national de la recherche scientifique, 800, rue de la Gauchetière Ouest, Montréal, QC H5A 1K6, Canada
Computer Research Institut of Montreal, 405, Ogilvy Avenue, suite 101, Montréal, QC H3N 1M3, Canada
2
Traditionally, subjective listening tests have been used and shown to be reliable [2]. In such a scenario, speech signals are presented to listeners (either naive or expert listeners, depending on the application) who judge the signal quality on a 5-point scale. The mean opinion score (MOS), which represents the perceived speech quality after leveling out individual factors [3], is attained after averaging all participant scores over a specific condition. Such subjective measurements, however, are not always feasible as they: (1) require many listene
Data Loading...