Find truth in the hands of the few: acquiring specific knowledge with crowdsourcing

  • PDF / 2,060,754 Bytes
  • 12 Pages / 612.284 x 802.205 pts Page_size
  • 46 Downloads / 180 Views

DOWNLOAD

REPORT


Find truth in the hands of the few: acquiring specific knowledge with crowdsourcing Tao HAN1,2, Hailong SUN 2

1,2

, Yangqiu SONG3 , Yili FANG4 , Xudong LIU1,2

1 SKLSDE Lab, School of Computer Science and Engineering, Beihang University, Beijing 100191, China Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing 100191, China 3 Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Clearwater Bay, Hong Kong 999077, China 4 School of Computer and Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, China

c Higher Education Press 2020 

Abstract Crowdsourcing has been a helpful mechanism to leverage human intelligence to acquire useful knowledge. However, when we aggregate the crowd knowledge based on the currently developed voting algorithms, it often results in common knowledge that may not be expected. In this paper, we consider the problem of collecting specific knowledge via crowdsourcing. With the help of using external knowledge base such as WordNet, we incorporate the semantic relations between the alternative answers into a probabilistic model to determine which answer is more specific. We formulate the probabilistic model considering both worker’s ability and task’s difficulty from the basic assumption, and solve it by the expectation-maximization (EM) algorithm. To increase algorithm compatibility, we also refine our method into semi-supervised one. Experimental results show that our approach is robust with hyper-parameters and achieves better improvement than majority voting and other algorithms when more specific answers are expected, especially for sparse data. Keywords crowdsourcing, knowledge acquisition, EM algorithm, label aggregation

1

Introduction

Crowdsourcing [1] aims at leveraging human intelligence to perform the tasks that computers currently are unable to do well. It has been successfully applied to many applications such as named entity resolution [2], image annotation [3], audio recognition [4], video annotation [5], etc. Among them, knowledge acquisition is one of the most representative ones. For a typical knowledge acquisition task, a data object (e.g., an image) is usually presented to multiple workers who are asked to provide a piece of knowledge about the data object in the form of labels. Then all the answers given by workers will be collected and aggregated with certain algorithms to generate the final result. In practice, workers may provide different levels of knowledge for the same data object due to their distinct

cognitive abilities. For instance, given a picture of a thornbill in Fig. 1, workers may provide labels such as bird, hummingbird and thornbill. Though these labels are all correct in some way, they represent different levels of knowledge about thornbill. Thus existing answer aggregation methods for tasks with single correct answers do not apply anymore. As such, a problem of which answer should be selected as the result of a knowledge acquisition task arises. I