Naive Bayes: applications, variations and vulnerabilities: a review of literature with code snippets for implementation
- PDF / 1,146,248 Bytes
- 17 Pages / 595.276 x 790.866 pts Page_size
- 99 Downloads / 218 Views
METHODOLOGIES AND APPLICATION
Naive Bayes: applications, variations and vulnerabilities: a review of literature with code snippets for implementation Indika Wickramasinghe1
· Harsha Kalutarage2
© Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract Naïve Bayes (NB) is a well-known probabilistic classification algorithm. It is a simple but efficient algorithm with a wide variety of real-world applications, ranging from product recommendations through medical diagnosis to controlling autonomous vehicles. Due to the failure of real data satisfying the assumptions of NB, there are available variations of NB to cater general data. With the unique applications for each variation of NB, they reach different levels of accuracy. This manuscript surveys the latest applications of NB and discusses its variations in different settings. Furthermore, recommendations are made regarding the applicability of NB while exploring the robustness of the algorithm. Finally, an attempt is given to discuss the pros and cons of NB algorithm and some vulnerabilities, with related computing code for implementation. Keywords Naïve Bayes · Probabilistic classification · Machine learning vulnerabilities · R code snippets
1 Introduction Recent advances in low-cost computing and data explosion have democratized machine learning and data analytic (MLDA), allowing developers to apply these technologies almost everywhere in real-world applications. Data classification plays a key role in MLDA. The use of some of the conventional statistical techniques in data classification became plausible as a result of low-cost computational power. Statistical approaches to MLDA problems have become major alternatives to well-known algorithms used in MLDA (John and Langley 1995). Though sophisticated statistical techniques are used in numerous applications, some appealing and effective outcomes have been revealed with the use of much simpler and basic statistical approaches (George et al. 1995). Most importantly, these alternative techniques to
Communicated by V. Loia.
B
Indika Wickramasinghe [email protected] Harsha Kalutarage [email protected]
1
Department of Mathematics, Prairie View A & M University, Prairie View, USA
2
School of Computing, Robert Gordon University, Aberdeen, United Kingdom
conventional machine learning algorithms have been outperformed by the statistical approaches. Broadly, any classification algorithm can be classified as either probabilistic or non-probabilistic. Probabilistic data classification relies on approximating a distribution. Data classification techniques based on probabilistic approach work well as most of the distributions of related features follow probabilistic nature. Garg and Roth (2001) answer the fundamental question of why probabilistic approach works well theoretically in various real-world applications. Some of the probabilistic classifiers include NB, logistic regression, and multilayer perceptrons. Support vector machines and K-nearest neighbors are examples for the non-pr
Data Loading...