Towards speech quality assessment using a crowdsourcing approach: evaluation of standardized methods
- PDF / 2,585,117 Bytes
- 21 Pages / 595.276 x 790.866 pts Page_size
- 21 Downloads / 157 Views
RESEARCH ARTICLE
Towards speech quality assessment using a crowdsourcing approach: evaluation of standardized methods Babak Naderi1 · Rafael Zequeira Jiménez1 · Matthias Hirth2 · Sebastian Möller1,3 · Florian Metzger4 · Tobias Hoßfeld4 Received: 25 May 2020 © The Author(s) 2020
Abstract Subjective speech quality assessment has traditionally been carried out in laboratory environments under controlled conditions. With the advent of crowdsourcing platforms tasks, which need human intelligence, can be resolved by crowd workers over the Internet. Crowdsourcing also offers a new paradigm for speech quality assessment, promising higher ecological validity of the quality judgments at the expense of potentially lower reliability. This paper compares laboratory-based and crowdsourcing-based speech quality assessments in terms of comparability of results and efficiency. For this purpose, three pairs of listening-only tests have been carried out using three different crowdsourcing platforms and following the ITU-T Recommendation P.808. In each test, listeners judge the overall quality of the speech sample following the Absolute Category Rating procedure. We compare the results of the crowdsourcing approach with the results of standard laboratory tests performed according to the ITU-T Recommendation P.800. Results show that in most cases, both paradigms lead to comparable results. Notable differences are discussed with respect to their sources, and conclusions are drawn that establish practical guidelines for crowdsourcing-based speech quality assessment. Keywords Speech quality assessment · Crowdsourcing · Validity · Reliability · P.808
Introduction * Babak Naderi babak.naderi@tu‑berlin.de Rafael Zequeira Jiménez rafael.zequeira@tu‑berlin.de Matthias Hirth matthias.hirth@tu‑ilmenau.de Sebastian Möller sebastian.moeller@tu‑berlin.de Florian Metzger florian.metzger@uni‑wuerzburg.de Tobias Hoßfeld tobias.hossfeld@uni‑wuerzburg.de 1
Quality and Usability Lab, Technische Universität Berlin, Berlin, Germany
2
User‑centric Analysis of Multimedia Data Group, Technische Universität Ilmenau, Ilmenau, Germany
3
Speech and Language Technology, German Research Center for Artificial Intelligence (DFKI), Berlin, Germany
4
Chair of Communication Networks, University of Würzburg, Würzburg, Germany
Quality of Experience (QoE) research concentrates on understanding user requirements towards systems or services, as well as their perceptions and judgments. Traditionally, QoE studies have addressed systems or services for multimedia content creation, transmission, and rendering. This includes systems for audio presentation, for video transmission, or for speech-based communication. In order to obtain quantitative metrics of QoE, subjective experiments are commonly conducted, in which representative groups of users judge multimedia content presented under controlled test conditions. Standardized guidelines exist for such experiments, e.g. in the Recommendations of the P-series of the Telecommunication Standardization Sector of
Data Loading...