Ensemble Classification

This chapter is concerned with ensemble classification, i.e. using a set of classifiers to classify unseen data rather than just a single one. The classifiers in the ensemble all predict the correct classification of each unseen instance and their predict

PDF / 324,269 Bytes
12 Pages / 441 x 666 pts Page_size
61 Downloads / 297 Views

DOWNLOAD

REPORT

14.1 Introduction The idea of ensemble classiﬁcation is to learn not just one classiﬁer but a set of classiﬁers, called an ensemble of classiﬁers, and then to combine their predictions for the classiﬁcation of unseen instances using some form of voting. This is illustrated in Figure 14.1 below. It is hoped that the ensemble will collectively have a higher level of predictive accuracy than any one of the individual classiﬁers, but that is not guaranteed. The term ensemble learning is often used to mean the same as ensemble classiﬁcation, but the former is a more general technique where a set of models is learnt that collectively can be applied to solving a problem of potentially any kind, not just classiﬁcation. The individual classiﬁers in an ensemble are known as base classiﬁers. If the base classiﬁers are all of the same kind (e.g. decision trees) the ensemble is known as homogeneous. Otherwise it is known as heterogeneous. A simple form of ensemble classiﬁcation algorithm is: 1. Generate N classiﬁers for a given dataset 2. For an unseen instance X a) Compute the predicted classiﬁcation of X for each of the N classiﬁers b) Select the classiﬁcation that is most frequently predicted. This is a majority voting model where each time a classiﬁer predicts a particular classiﬁcation for an unseen instance it counts as one ‘vote’ for that © Springer-Verlag London Ltd. 2016 M. Bramer, Principles of Data Mining, Undergraduate Topics in Computer Science, DOI 10.1007/978-1-4471-7307-6 14

209

210

Principles of Data Mining

classiﬁcation. With N classiﬁers in the ensemble there will be a total of N votes and the classiﬁcation with most votes wins, i.e. is deemed to be the ensemble’s prediction of the correct classiﬁcation.

Figure 14.1 Ensemble Classiﬁcation The obvious objection to an ensemble classiﬁer approach is that generating N classiﬁers takes much longer than only one and this additional eﬀort is only justiﬁed if the performance of the ensemble is substantially better than that of just a single classiﬁer. There is no guarantee that this will be the case for a given set of test data and far less so for an individual unseen instance, but intuitively it seems reasonable to believe that N classiﬁers ‘working together’ have the potential to give better predictive accuracy than one on its own. In practice this is likely to depend on how the classiﬁers are generated and how their predictions are combined (majority voting or otherwise). In this chapter we will restrict our attention to the homogeneous case, where all the classiﬁers are of the same kind, say decision trees. There are several ways in which an ensemble can be formed, for example: – N trees generated using the same tree generation algorithm, with diﬀerent parameter settings, all using the same training data. – N trees generated using the same tree generation algorithm, all with diﬀerent training data and either with the same or with diﬀerent parameter settings. – N trees generated using a variety of diﬀerent tree generation algorithms, either with the same or wi

Data Loading...

Ensemble Classification

Recommend Documents

Federated Ensemble Regression Using Classification

Gait classification through CNN-based ensemble learning

Ensemble Malware Classification Using Neural Networks

Spam Mail Classification Using Ensemble and Non-Ensemble Machine Learning Algorithms

DTCo: An Ensemble SSL Algorithm for X-ray Classification

A Simple Classification Ensemble for ADL and Falls

Ensemble Approaches of Support Vector Machines for Multiclass Classification

A Neural Tree Network Ensemble Mode for Disease Classification

Classification of Diffuse Lung Diseases Using Heterogeneous Ensemble Classifiers

Multi-model Ensemble Learning Architecture Based on 3D CNN for Lung Nodule Malignancy Suspiciousness Classification

Ensemble Feature Selection Method Based on Bio-inspired Algorithms for Multi-objective Classification Problem

Pixel-Level Weed Classification Using Evolutionary Selection of Local Binary Pattern in a Stochastic Optimised Ensemble