Approximating Global Optimum for Probabilistic Truth Discovery
- PDF / 4,220,792 Bytes
- 26 Pages / 439.37 x 666.142 pts Page_size
- 65 Downloads / 230 Views
Approximating Global Optimum for Probabilistic Truth Discovery Shi Li1 · Jinhui Xu1 · Minwei Ye1 Received: 18 January 2019 / Accepted: 17 April 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract The problem of truth discovery arises in many areas such as database, data mining, data crowdsourcing and machine learning. It seeks trustworthy information from possibly conflicting data provided by multiple sources. Due to its practical importance, the problem has been studied extensively in recent years. Two competing models were proposed for truth discovery, weight-based model and probabilistic model. While (1 + 𝜖)-approximations have already been obtained for the weightbased model, no quality guaranteed solution has been discovered yet for the probabilistic model. In this paper, we focus on the probabilistic model and formulate it as a geometric optimization problem. Based on a sampling technique and a few other ideas, we achieve the first (1 + 𝜖)-approximation solution. Our techniques can also be used to solve the more general multi-truth discovery problem. We validate our method by conducting experiments on both synthetic and real-world datasets (teaching evaluation data) and comparing its performance to some existing approaches. Our solutions are closer to the truth as well as global optimum based on the experimental result. The general technique we developed has the potential to be used to solve other geometric optimization problems. Keywords Geometric optimization · Data mining · High dimensions · Approximation
This research was supported in part by NSF through Grants CCF-1422324, IIS-1422591, CCF1716400, and IIS-1910492. A preliminary version of this work has appeared in COCOON’18. * Jinhui Xu [email protected] Shi Li [email protected] Minwei Ye [email protected] 1
Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, USA
13
Vol.:(0123456789)
Algorithmica
1 Introduction Truth discovery has received a great deal of attentions in recent years in databases, data crowdsourcing, machine learning and data mining [10, 11, 16, 17, 19]. It emerges from various practical scenarios such as copying detection [5], data fusion [3] and conflicting information resolving on the web [19]. In a typical scenario, the unknown truth for one or multiple objects can be viewed as a vector in a high dimensional space. The information about the truth vector may be aggregated from multiple sources, and could be inaccurate, conflicting or even biased if it comes from subjective evaluations. Our goal is to infer the truth vector from these noisy data. A naive method for this problem is to take the average of the vectors from all sources as the the ground truth (for coordinates correspondent to categorical data, take the majority vote). However, this approach, which inherently treats all sources as equally important, is vulnerable to unreliable and malicious sources. Such sources can provide information that pulls the average away from the truth. A mor
Data Loading...