Bayesian parameter learning with an application
- PDF / 576,316 Bytes
- 14 Pages / 439.37 x 666.142 pts Page_size
- 68 Downloads / 225 Views
Bayesian parameter learning with an application Ali Karimnezhad1 · Fahimeh Moradi1
Received: 13 November 2014 / Accepted: 22 September 2015 / Published online: 13 October 2015 © Sapienza Universitá di Roma 2015
Abstract This paper deals with prior uncertainty in the parameter learning procedure in Bayesian networks. In most studies in the literature, parameter learning is based on two wellknown criteria, i.e., the maximum likelihood and the maximum a posteriori. In presence of prior information, the literature abounds with situations in which a maximum a posteriori estimate is computed as a desired estimate but in those studies, it does not seem that the viewpoint behind its use is according to a loss function-based viewpoint. In this paper, we recall the maximum a posteriori estimator as the Bayes estimator under the zero-one loss function and criticizing the zero-one loss, we suggest the use of the general Entropy loss function as a useful loss when overlearning and underlearning need serious attention. We take prior uncertainty into account and extend the act of parameter learning for the case when prior information is polluted. Addressing a real world problem, we conduct a simulation procedure to study behavior of the proposed estimates. Finally, in order to seek the effect of changing hyperparameters of a chosen prior on the learning procedure, we carry out a sensitivity analysis w.r.t. some chosen hyperparameters. Keywords Bayesian estimation · Bayesian networks · Contaminated priors · Directed acyclic graph · General Entropy loss function
1 Introduction Bayesian Networks (BNs) are one of the most popular models for presenting joint probability distributions for a given set of dependent random variables [18,22]. In the literature, the act of learning parameters of BNs has been noticed in two steps: (1) specifying con-
B
Ali Karimnezhad [email protected] Fahimeh Moradi [email protected]
1
Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, Canada
123
62
A. Karimnezhad, F. Moradi
ditional independence relations or briefly learning structure of the network, (2) estimating parameters that specify joint probability distributions of the network or briefly parameter learning [15,16] (to avoid a possible misunderstanding and for an easy pursuing matter, we emphasize that when we use the phrase ‘learning’ for a parameter, it means that we ‘estimate’ that parameter). Usually, a machine is expected to learn both structure and parameters of a network and once the structure of a network specified, the act of parameter learning becomes possible. The methods for learning the structure of a BN is usually based on either conditional independence tests [5,28] or some scoring metrics [7,8,14]. The methods for structure learning have been criticized and developed extensively [24,30]. Learning structure of a BN is respected by computing the joint probability density function (pdf) and is performed following some algorithms in the literature.
Data Loading...