Approximating Global Optimum for Probabilistic Truth Discovery

PDF / 4,220,792 Bytes
26 Pages / 439.37 x 666.142 pts Page_size
65 Downloads / 272 Views

Approximating Global Optimum for Probabilistic Truth Discovery Shi Li1 · Jinhui Xu1 · Minwei Ye1 Received: 18 January 2019 / Accepted: 17 April 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract The problem of truth discovery arises in many areas such as database, data mining, data crowdsourcing and machine learning. It seeks trustworthy information from possibly conflicting data provided by multiple sources. Due to its practical importance, the problem has been studied extensively in recent years. Two competing models were proposed for truth discovery, weight-based model and probabilistic model. While (1 + 𝜖)-approximations have already been obtained for the weightbased model, no quality guaranteed solution has been discovered yet for the probabilistic model. In this paper, we focus on the probabilistic model and formulate it as a geometric optimization problem. Based on a sampling technique and a few other ideas, we achieve the first (1 + 𝜖)-approximation solution. Our techniques can also be used to solve the more general multi-truth discovery problem. We validate our method by conducting experiments on both synthetic and real-world datasets (teaching evaluation data) and comparing its performance to some existing approaches. Our solutions are closer to the truth as well as global optimum based on the experimental result. The general technique we developed has the potential to be used to solve other geometric optimization problems. Keywords Geometric optimization · Data mining · High dimensions · Approximation

This research was supported in part by NSF through Grants CCF-1422324, IIS-1422591, CCF1716400, and IIS-1910492. A preliminary version of this work has appeared in COCOON’18. * Jinhui Xu [email protected] Shi Li [email protected] Minwei Ye [email protected] 1

Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, USA

13

Vol.:(0123456789)

Algorithmica

1 Introduction Truth discovery has received a great deal of attentions in recent years in databases, data crowdsourcing, machine learning and data mining [10, 11, 16, 17, 19]. It emerges from various practical scenarios such as copying detection [5], data fusion [3] and conflicting information resolving on the web [19]. In a typical scenario, the unknown truth for one or multiple objects can be viewed as a vector in a high dimensional space. The information about the truth vector may be aggregated from multiple sources, and could be inaccurate, conflicting or even biased if it comes from subjective evaluations. Our goal is to infer the truth vector from these noisy data. A naive method for this problem is to take the average of the vectors from all sources as the the ground truth (for coordinates correspondent to categorical data, take the majority vote). However, this approach, which inherently treats all sources as equally important, is vulnerable to unreliable and malicious sources. Such sources can provide information that pulls the average away from the truth. A mor

Data Loading...

Approximating Global Optimum for Probabilistic Truth Discovery

Recommend Documents

An End-to-End Deep Neural Network for Truth Discovery

Police: An Effective Truth Discovery Method in Intelligent Crowd Sensing

Private Truth Discovery in Cloud-Empowered Crowdsensing Systems

Multi-objective global optimum design of collaborative robots

Approximating k-Orthogonal Line Center

Approximating Border Length for DNA Microarray Synthesis

Truth predicates, truth bearers, and their variants

Dirac. Beauty Is Truth, Truth Beauty

A constructive method for approximating trigonometric functions and their integrals

Ground-Truth

Approximating Graph Edit Distance Using GNCCP

Probabilistic Logics and Probabilistic Networks