A Differential Evolution Approach to Feature Selection and Instance Selection

More and more data is being collected due to constant improvements in storage hardware and data collection techniques. The incoming flow of data is so much that data mining techniques cannot keep up with. The data collected often has redundant or irreleva

PDF / 296,428 Bytes
15 Pages / 439.37 x 666.142 pts Page_size
84 Downloads / 269 Views

DOWNLOAD

REPORT

Abstract. More and more data is being collected due to constant improvements in storage hardware and data collection techniques. The incoming ﬂow of data is so much that data mining techniques cannot keep up with. The data collected often has redundant or irrelevant features/instances that limit classiﬁcation performance. Feature selection and instance selection are processes that help reduce this problem by eliminating useless data. This paper develops a set of algorithms using Diﬀerential Evolution to achieve feature selection, instance selection, and combined feature and instance selection. The reduction of the data, the classiﬁcation accuracy and the training time are compared with the original data and existing algorithms. Experiments on ten datasets of varying diﬃculty show that the newly developed algorithms can successfully reduce the size of the data, and maintain or increase the classiﬁcation performance in most cases. In addition, the computational time is also substantially reduced. This work is the ﬁrst time for systematically investigating a series of algorithms on feature and/or instance selection in classiﬁcation and the ﬁndings show that instance selection is a much harder task to solve than feature selection, but with eﬀective methods, it can signiﬁcantly reduce the size of the data and provide great beneﬁt. Keywords: Diﬀerential evolution tion · Classiﬁcation

1

·

Feature selection

·

Instance selec-

Introduction

As hardware technology improves, more and more data is collected at a rate machine learning and data mining techniques cannot deal with. Often the data collected contains redundant or irrelevant features and instances [7,9,14,22,25], which may slow down and hindering the learning process in many tasks such as classiﬁcation, reduce the learning performance, and/or learn complex models. A pre-processing step is often needed to remove some of the irrelevant or even noisy data, which can be achieved by feature selection (FS) for selecting only a small subset of informative features, instance selection (IS) for selecting only a small subset of representative examples/instances, or FS and IS for removing useless or redundant features and instances [13,17]. However, FS and/or IS is a c Springer International Publishing Switzerland 2016 R. Booth and M.-L. Zhang (Eds.): PRICAI 2016, LNAI 9810, pp. 588–602, 2016. DOI: 10.1007/978-3-319-42911-3 49

A Diﬀerential Evolution Approach to FS and IS

589

challenging problem due to two main reasons. The ﬁrst is the large search space, which grows exponentially with the total number of features and instances. The second is that there are almost always interactions between features, which leads to a complex search space with many local optima and a good ﬁtness function is often needed to guide the search in order to ﬁnd a good solution. There have been a large number of works on FS, but not much work on IS, or FS and IS [13]. Diﬀerent search techniques have been used for FS, but existing algorithms still suﬀer from the problem of stagnation in local

Data Loading...

A Differential Evolution Approach to Feature Selection and Instance Selection

Recommend Documents

A Feature Selection Approach to Visual Domain Adaptation in Classification

A Neuroevolutionary Approach to Feature Selection Using Multiobjective Evolutionary Algorithms

Joint feature and instance selection using manifold data criteria: application to image classification

A context-aware recommendation approach based on feature selection

A new ensemble feature selection approach based on genetic algorithm

Feature Selection for Clustering

Local Feature Selection

Feature Selection Method Based on Differential Correlation Information Entropy

Using Differential Evolution to Develop a Carbon-Integrated Model for Performance Evaluation and Selection of Sustainabl

A new feature selection using dynamic interaction

Materialized View Selection Using Discrete Quantum Based Differential Evolution Algorithm

Feature Selection-Based Approach for Generalized Physical Contradiction Recognition