Bayesian Network Structure Learning with Messy Inputs: The Case of Multiple Incomplete Datasets and Expert Opinions

In this paper, we present an approach to build the structure of a Bayesian network from multiple disparate inputs. Specifically, our method accepts as input multiple partially overlapping datasets with missing data along with expert opinions about the str

  • PDF / 600,034 Bytes
  • 16 Pages / 439.37 x 666.142 pts Page_size
  • 10 Downloads / 185 Views

DOWNLOAD

REPORT


Abstract. In this paper, we present an approach to build the structure of a Bayesian network from multiple disparate inputs. Specifically, our method accepts as input multiple partially overlapping datasets with missing data along with expert opinions about the structure of the model and produces an associated directed acyclic graph representing the graphical layer of a Bayesian network. We provide experimental results where we compare our algorithm with an application of Structural Expectation Maximization. We also provide a real world example motivating the need for combining disparate sources of information even when noisy and not fully aligned with one another.

Keywords: Bayesian network datasets

1

· Expert opinions · Multiple incomplete

Introduction

Decision and Risk Analysis make extensive uses of influence diagrams, at the heart of which lies the probabilistic model known as a Bayesian network. Bayesian networks (BNs) [14] are a compact representation of the relationships among random variables. They provide an intuitive graphical representation of the variables dependence and independence relations along with an efficient way to perform inference queries. A Bayesian network is composed of two main elements: (i) the structure of the network captured by the directed acyclic graph and representing dependence and independence relations among the variables and (ii) the parameters of the network in the form of conditional probability tables, representing conditional distributions of the variables given all possible scenarios of their parents. Building a Bayesian network typically involves first determining the structure of the network and then estimating the parameters. We focus in this paper on the first step, the determination of the directed acyclic graph underlying the network. In the past, it has been customary to assume that one would have access to one single clean input (one set of experts or one dataset) to determine the structure of the network. This assumption needs to be revisited as radical technology c Springer International Publishing Switzerland 2015  T. Walsh (Ed.): ADT 2015, LNAI 9346, pp. 123–138, 2015. DOI: 10.1007/978-3-319-23114-3 8

124

S. Sajja and L.A. Deleris

changes from the past 30 years have modified the context in which Bayesian networks are built. While the availability of information and experts has significantly increased, the quality of this knowledge may not have improved in general. Technology makes it simpler to assemble in a few clicks a distributed team of experts, and through web-based techniques, to elicit information from them in a minimally disruptive way. However, web-based elicitation can also result in experts being less engaged and consequently less focused and reliable. Having access to a distributed team of experts also increases the chance of conflicting opinions while the asynchronous approach to elicitation makes it difficult to resolve conflict directly. Similarly, Big Data does not necessarily imply better data. In fact, Big Data is mainly a euphemism to talk abou