Crowdsourced top- k queries by pairwise preference judgments with confidence and budget control
- PDF / 2,365,236 Bytes
- 25 Pages / 595.276 x 790.866 pts Page_size
- 20 Downloads / 180 Views
REGULAR PAPER
Crowdsourced top-k queries by pairwise preference judgments with confidence and budget control Yan Li1 · Hao Wang2 · Ngai Meng Kou3 · Leong Hou U1 · Zhiguo Gong1 Received: 15 November 2019 / Revised: 5 June 2020 / Accepted: 26 August 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract Crowdsourced query processing is an emerging technique that tackles computationally challenging problems by human intelligence. The basic idea is to decompose a computationally challenging problem into a set of human-friendly microtasks (e.g., pairwise comparisons) that are distributed to and answered by the crowd. The solution of the problem is then computed (e.g., by aggregation) based on the crowdsourced answers to the microtasks. In this work, we attempt to revisit the crowdsourced processing of the top-k queries, aiming at (1) securing the quality of crowdsourced comparisons by a certain confidence level and (2) minimizing the total monetary cost. To secure the quality of each paired comparison, we employ statistical tools to estimate the confidence interval from the collected judgments of the crowd, which is then used to guide the aggregated judgment. We propose novel frameworks, SPR and SPR+ , to address the crowdsourced top-k queries. Both SPR and SPR+ are budget-aware, confidence-aware, and effective in producing high-quality top-k results. SPR requires as input a budget for each paired comparison, whereas SPR+ requires only a total budget for the whole top-k task. Extensive experiments, conducted on four real datasets, demonstrate that our proposed methods outperform the other existing top-k processing techniques by a visible difference. Keywords Crowdsourcing · Top-k query · Preference judgments · Confidence · Budget control
1 Introduction Recently, crowdsourcing has been employed to process a variety of database queries, including the MAX queries [7,8, 18,22,40], the JOIN queries [31,42], and the top-k queries [25, 26,29]. In this work, we focus on the crowdsourced top-k
B B
Hao Wang [email protected] Leong Hou U [email protected] Yan Li [email protected] Ngai Meng Kou [email protected] Zhiguo Gong [email protected]
queries over a collection of data items, where humans are involved in deciding the orderings of items. Such crowdsourced top-k queries are particularly helpful in computerhostile but human-friendly ranking tasks. Examples. In the field of machine translation, it is an emerging requirement to find the best translations of a sentence among a set of candidate translations. This is a difficult task for computers since the judgment relies on advanced natural language skills. However, humans can easily point out the better translation of two candidates if they speak both languages. Thus, applications such as Google Translate1 , Duolingo2 , and Twitter3 etc. use crowdsourcing to rank the candidate translations. Other examples emerge in the field of public opinion analysis. For instance, finding the top-3 best-performing soccer players of the year is the ever
Data Loading...