Local Learning Algorithm for Markov Blanket Discovery

Learning of Markov blanket can be regarded as an optimal solution to the feature selection problem. In this paper, we propose a local learning algorithm, called Breadth-First search of MB (BFMB), to induce Markov blanket (MB) without having to learn a Bay

  • PDF / 428,673 Bytes
  • 12 Pages / 430 x 660 pts Page_size
  • 58 Downloads / 182 Views

DOWNLOAD

REPORT


Abstract. Learning of Markov blanket can be regarded as an optimal solution to the feature selection problem. In this paper, we propose a local learning algorithm, called Breadth-First search of MB (BFMB), to induce Markov blanket (MB) without having to learn a Bayesian network first. It is demonstrated as (1) easy to understand and prove to be sound in theory; (2) data efficient by making full use of the knowledge of underlying topology of MB; (3) fast by relying on fewer data passes and conditional independent test than other approaches; (4) scalable to thousands of variables due local learning. Empirical results on BFMB, along with known Iterative Association Markov blanket (IAMB) and Parents and Children based Markov boundary (PCMB), show that (i) BFMB significantly outperforms IAMB in measures of data efficiency and accuracy of discovery given the same amount of instances available (ii) BFMB inherits all the merits of PCMB, but reaches higher accuracy level using only around 20% and 60% of the number of data passes and conditional tests, respectively, used by PCMB. Keywords: Markov blanket, local learning, feature selection.

1 Introduction Classification is a fundamental task in data mining that requires learning a classifier from a data sample. Basically, a classifier is a function that maps instances described by a set of attributes to a class label. How to identify the minimal, or close to minimal, subset of variables that best predicts the target variable of interest is known as feature (or variable) subset selection (FSS). In the past three decades, FSS for classification has been given considerable attention, and it is even more critical today in many applications, like biomedicine, where high dimensionality but few observations are challenging traditional FSS algorithms. A principle solution to the feature selection problem is to determine a subset of attributes that can render the rest of all attributes independent of the variable of interest [8,9,16]. Koller and Sahami (KS)[9] first recognized that the Markov blanket (see its definition below) of a given target attribute is the theoretically optimal set of attributes to predict the target’s value, though the Markov blanket itself is not a new concept that can be traced back to 1988[11]. *

This work was done during the author’s time in SPSS.

M.A. Orgun and J. Thornton (Eds.): AI 2007, LNAI 4830, pp. 68–79, 2007. © Springer-Verlag Berlin Heidelberg 2007

Local Learning Algorithm for Markov Blanket Discovery

69

A Markov blanket of a target attribute T renders it statistically independent from all the remaining attributes, that is, given the values of the attributes in the Markov blanket, the probability distribution of T is completely determined and knowledge of any other variable(s) becomes superfluous [11]. Definition 1 (Conditional independent). Variable X and T are conditionally independent given the set of variables Z (bold symbol is used for set), iff. P(T | X , Z ) = P (T | Z ) , denoted as T ⊥ X | Z .

Similarly, T ⊥ X | Z is used to denote