Towards speech quality assessment using a crowdsourcing approach: evaluation of standardized methods

PDF / 2,585,117 Bytes
21 Pages / 595.276 x 790.866 pts Page_size
21 Downloads / 198 Views

RESEARCH ARTICLE

Towards speech quality assessment using a crowdsourcing approach: evaluation of standardized methods Babak Naderi1 · Rafael Zequeira Jiménez1 · Matthias Hirth2 · Sebastian Möller1,3 · Florian Metzger4 · Tobias Hoßfeld4 Received: 25 May 2020 © The Author(s) 2020

Abstract Subjective speech quality assessment has traditionally been carried out in laboratory environments under controlled conditions. With the advent of crowdsourcing platforms tasks, which need human intelligence, can be resolved by crowd workers over the Internet. Crowdsourcing also offers a new paradigm for speech quality assessment, promising higher ecological validity of the quality judgments at the expense of potentially lower reliability. This paper compares laboratory-based and crowdsourcing-based speech quality assessments in terms of comparability of results and efficiency. For this purpose, three pairs of listening-only tests have been carried out using three different crowdsourcing platforms and following the ITU-T Recommendation P.808. In each test, listeners judge the overall quality of the speech sample following the Absolute Category Rating procedure. We compare the results of the crowdsourcing approach with the results of standard laboratory tests performed according to the ITU-T Recommendation P.800. Results show that in most cases, both paradigms lead to comparable results. Notable differences are discussed with respect to their sources, and conclusions are drawn that establish practical guidelines for crowdsourcing-based speech quality assessment. Keywords Speech quality assessment · Crowdsourcing · Validity · Reliability · P.808

Introduction * Babak Naderi babak.naderi@tu‑berlin.de Rafael Zequeira Jiménez rafael.zequeira@tu‑berlin.de Matthias Hirth matthias.hirth@tu‑ilmenau.de Sebastian Möller sebastian.moeller@tu‑berlin.de Florian Metzger florian.metzger@uni‑wuerzburg.de Tobias Hoßfeld tobias.hossfeld@uni‑wuerzburg.de 1

Quality and Usability Lab, Technische Universität Berlin, Berlin, Germany

2

User‑centric Analysis of Multimedia Data Group, Technische Universität Ilmenau, Ilmenau, Germany

3

Speech and Language Technology, German Research Center for Artificial Intelligence (DFKI), Berlin, Germany

4

Chair of Communication Networks, University of Würzburg, Würzburg, Germany

Quality of Experience (QoE) research concentrates on understanding user requirements towards systems or services, as well as their perceptions and judgments. Traditionally, QoE studies have addressed systems or services for multimedia content creation, transmission, and rendering. This includes systems for audio presentation, for video transmission, or for speech-based communication. In order to obtain quantitative metrics of QoE, subjective experiments are commonly conducted, in which representative groups of users judge multimedia content presented under controlled test conditions. Standardized guidelines exist for such experiments, e.g. in the Recommendations of the P-series of the Telecommunication Standardization Sector of

Data Loading...

Towards speech quality assessment using a crowdsourcing approach: evaluation of standardized methods

Recommend Documents

Disordered Speech Assessment Using Automatic Methods Based on Quantitative Measures

Learning Complex Concepts Using Crowdsourcing: A Bayesian Approach

Wireless Indoor Localization A Crowdsourcing Approach

A novel BNMF-DNN based speech reconstruction method for speech quality evaluation under complex environments

Towards a comprehensive approach for the sustainability assessment of a product: product social impact assessment

Fourier-Lapped Multilayer Perceptron Method for Speech Quality Assessment

Geochemistry and quality assessment of groundwater using graphical and multivariate statistical methods. A case study: G

Quality Function Deployment and Taguchi Methods: A Pragmatical Approach

Assessment and Evaluation of Speech-Based Interactive Systems: From Manual Annotation to Automatic Usability Evaluation

Evaluation of new motorized articulating laparoscopic instruments by laparoscopic novices using a standardized laparosco

Evaluation of contact heat thermal threshold testing for standardized assessment of cutaneous nociception in horses - co

A Quality Assessment Tool for Koblenz Datasets Using Metrics-Driven Approach