Robust Approach for Estimating Probabilities in Naive-Bayes Classifier
Naive-Bayes classifier is a popular technique of classification in machine learning. Improving the accuracy of naive-Bayes classifier will be significant as it has great importance in classification using numerical attributes. For numeric attributes, the
- PDF / 259,836 Bytes
- 6 Pages / 430 x 660 pts Page_size
- 71 Downloads / 236 Views
Indian Institute of Technology, Delhi Hauz Khas, New Delhi, India 110 016 [email protected] 2 Institute for Systems Studies and Analyses Metcalfe House, Delhi, India 110 054
Abstract. Naive-Bayes classifier is a popular technique of classification in machine learning. Improving the accuracy of naive-Bayes classifier will be significant as it has great importance in classification using numerical attributes. For numeric attributes, the conditional probabilities are either modeled by some continuous probability distribution over the range of that attribute’s values or by conversion of numeric attribute to discrete one using discretization. The limitation of the classifier using discretization is that it does not classify those instances for which conditional probabilities of any of the attribute value for every class is zero. The proposed method resolves this limitation of estimating probabilities in the naiveBayes classifier and improve the classification accuracy for noisy data. The proposed method is efficient and robust in estimating probabilities in the naive-Bayes classifier. The proposed method has been tested over a number of databases of UCI machine learning repository and the comparative results of existing naive-Bayes classifier and proposed method has also been illustrated.
1
Introduction
Classification has wide application in pattern recognition. In classification, a vector of attribute values describes each instance. Classifier is used to predict the class of the test instance using training data, a set of instances with known classes. Decision trees [13], k- nearest neighbor [1], naive- Bayes classifier [6,7,8] etc. are the commonly used methods of classification. Naive-Bayes classifier (NBC) is a simple probabilistic classifier with strong assumption of independence. Although attributes independence assumption is generally a poor assumption and often violated for real data sets, Langley et. al. [10] found that NBC outperformed an algorithm for decision-tree induction. Domingos et.al. [4] has also found that this limitation has less impact than might be expected. It often provides better classification accuracy on real time data sets than any other classifier does. It also requires small amount of training data. It is also useful for high dimensional data as probability of each attribute is estimated independently. There is no need to scale down the dimension of the data as required in some popular classification techniques. A. Ghosh, R.K. De, and S.K. Pal (Eds.): PReMI 2007, LNCS 4815, pp. 11–16, 2007. c Springer-Verlag Berlin Heidelberg 2007
12
B. Chandra, M. Gupta, and M.P. Gupta
NBC has a limitation in predicting the class of instances for which conditional probabilities of each class are zero i.e. conditional probability of any of the attribute value for every class is zero. To rectify this problem, the Laplace-estimate [3] is used to estimate the probability of the class and M-estimate [3] is used to estimate conditional probability of any of the attribute value. The results obtained from these estimates
Data Loading...