Simplifying, Regularizing and Strengthening Sum-Product Network Structure Learning

The need for feasible inference in Probabilistic Graphical Models (PGMs) has lead to tractable models like Sum-Product Networks (SPNs). Their highly expressive power and their ability to provide exact and tractable inference make them very attractive for

  • PDF / 544,386 Bytes
  • 16 Pages / 439.37 x 666.142 pts Page_size
  • 0 Downloads / 130 Views

DOWNLOAD

REPORT


ntroduction

Probabilistic Graphical Models (PGMs) [13] use a graph-based representation eliciting the conditional independence assumptions among a set of random variables, thus providing a compact encoding of complex joint probability distributions. The most common task one want to solve using PGMs is inference, a task that becomes intractable for complex networks, a difficulty often circumvented by adopting approximate inference. For instance, computing the exact marginal or conditional probability of a query is a #P-complete problem [27]. However, there are many recently proposed PGMs where inference becomes tractable. They include graphs with low treewidth, such as tree-structured graphical models where each variable has at most one parent in the network structure [7], and their extensions with mixtures [18] or latent variables [6], or Thin Junction Trees [4], allowing controlled treewidths. Being more general than all of these models and yet preserving tractable and exact inference, Sum-Product Networks (SPNs) [23] provide an interesting model, successfully employed in image reconstruction and recognition [1,9,23], speech recognition [21] and NLP [5] tasks. Similarly to Arithmetic Circuits (ACs) [16], to which they are equivalent for finite domains [26], they compile a high treewidth network into a deep c Springer International Publishing Switzerland 2015  A. Appice et al. (Eds.): ECML PKDD 2015, Part II, LNAI 9285, pp. 343–358, 2015. DOI: 10.1007/978-3-319-23525-7 21

344

A. Vergari et al.

probabilistic architecture. By layering inner nodes, sum and product nodes, they encode the probability density function over the observed variables, represented as leaf nodes. SPNs guarantee inference in time linear to their network size [23], and they possibly becomes more expressively efficient as their depth increases [17]. Recently the attention around SPNs has focused on structure learning algorithms as ways to automate latent interaction discovery among observed variables and to avoid the cost of parameter learning [8,10,19,26]. While many of these efforts concentrated on optimizing the likelihoods of the models, little attention has been devoted to the structural quality of such models, or to understand how data quality effects the learning process. In this paper we extend and simplify one of the state-of-the-art SPN structure learning algorithm, LearnSPN [10], providing several improvements and insights. We show how to a) learn simpler SPNs, i.e. ones with less edges, parameters and more layers, b) stop the building process earlier while preserving goodness of fit, and c) be more robust and resilient in estimating the dependencies from data. In order to accomplish this we limit the number of node children when building the network, we introduce tractable multivariate distributions, in the form of ChowLiu trees [7], as leaves of an hybrid architecture without adding complexity to the network, and we enhance the mixture models of an SPN via bootstrap samples, i.e. by applying bagging for the likelihood function estimation.