Robust Approach for Estimating Probabilities in Naive-Bayes Classifier

Naive-Bayes classifier is a popular technique of classification in machine learning. Improving the accuracy of naive-Bayes classifier will be significant as it has great importance in classification using numerical attributes. For numeric attributes, the

PDF / 259,836 Bytes
6 Pages / 430 x 660 pts Page_size
71 Downloads / 263 Views

DOWNLOAD

REPORT

Indian Institute of Technology, Delhi Hauz Khas, New Delhi, India 110 016 [email protected] 2 Institute for Systems Studies and Analyses Metcalfe House, Delhi, India 110 054

Abstract. Naive-Bayes classiﬁer is a popular technique of classiﬁcation in machine learning. Improving the accuracy of naive-Bayes classiﬁer will be signiﬁcant as it has great importance in classiﬁcation using numerical attributes. For numeric attributes, the conditional probabilities are either modeled by some continuous probability distribution over the range of that attribute’s values or by conversion of numeric attribute to discrete one using discretization. The limitation of the classiﬁer using discretization is that it does not classify those instances for which conditional probabilities of any of the attribute value for every class is zero. The proposed method resolves this limitation of estimating probabilities in the naiveBayes classiﬁer and improve the classiﬁcation accuracy for noisy data. The proposed method is eﬃcient and robust in estimating probabilities in the naive-Bayes classiﬁer. The proposed method has been tested over a number of databases of UCI machine learning repository and the comparative results of existing naive-Bayes classiﬁer and proposed method has also been illustrated.

1

Introduction

Classiﬁcation has wide application in pattern recognition. In classiﬁcation, a vector of attribute values describes each instance. Classiﬁer is used to predict the class of the test instance using training data, a set of instances with known classes. Decision trees [13], k- nearest neighbor [1], naive- Bayes classiﬁer [6,7,8] etc. are the commonly used methods of classiﬁcation. Naive-Bayes classiﬁer (NBC) is a simple probabilistic classiﬁer with strong assumption of independence. Although attributes independence assumption is generally a poor assumption and often violated for real data sets, Langley et. al. [10] found that NBC outperformed an algorithm for decision-tree induction. Domingos et.al. [4] has also found that this limitation has less impact than might be expected. It often provides better classiﬁcation accuracy on real time data sets than any other classiﬁer does. It also requires small amount of training data. It is also useful for high dimensional data as probability of each attribute is estimated independently. There is no need to scale down the dimension of the data as required in some popular classiﬁcation techniques. A. Ghosh, R.K. De, and S.K. Pal (Eds.): PReMI 2007, LNCS 4815, pp. 11–16, 2007. c Springer-Verlag Berlin Heidelberg 2007

12

B. Chandra, M. Gupta, and M.P. Gupta

NBC has a limitation in predicting the class of instances for which conditional probabilities of each class are zero i.e. conditional probability of any of the attribute value for every class is zero. To rectify this problem, the Laplace-estimate [3] is used to estimate the probability of the class and M-estimate [3] is used to estimate conditional probability of any of the attribute value. The results obtained from these estimates

Data Loading...

Robust Approach for Estimating Probabilities in Naive-Bayes Classifier

Recommend Documents

Multiple classifier systems for robust classifier design in adversarial environments

Estimating the Predictive Accuracy of a Classifier

Calibrating Margin-Based Classifier Scores into Polychotomous Probabilities

The Complete Gabor-Fisher Classifier for Robust Face Recognition

An Application of Markov Models in Estimating Transition Probabilities for Postmenopausal Women with Osteoporosis

Fuzzy Probabilities New Approach and Applications

Fuzzy Probabilities New Approach and Applications

Robust Approach for Emotion Classification Using Gait

Estimating Transition Probabilities from Published Evidence: A Tutorial for Decision Modelers

Correction to: Estimating Transition Probabilities from Published Evidence: A Tutorial for Decision Modelers

Doubly robust augmented-estimating-equations estimation with nonignorable nonresponse data

Stationary Probabilities