MeLiF+: Optimization of Filter Ensemble Algorithm with Parallel Computing

Search of algorithms ensemble – that is, best algorithms combination is common used approach in machine learning. MeLiF algorithm uses this technique for filter feature selection. In our research we proposed parallel version of this algorithm and showed t

PDF / 209,055 Bytes
7 Pages / 439.37 x 666.142 pts Page_size
77 Downloads / 222 Views

DOWNLOAD

REPORT

Abstract. Search of algorithms ensemble – that is, best algorithms combination is common used approach in machine learning. MeLiF algorithm uses this technique for ﬁlter feature selection. In our research we proposed parallel version of this algorithm and showed that it is not only improves algorithm performance signiﬁcantly, but also improves feature selection quality. Keywords: Feature selection · Variable selection · Attribute selection · Ensemble learning · Feature ﬁlters · Metrics aggregation · MeLiF · Parallel computing

1

Introduction

In modern world, machine learning became one of the most promising and studied science areas, mainly, because of its universal application to any data-related problem. One example of such an area is bioinformatics [3,4,6,10], which produces giant amount of data about gene expression of diﬀerent organisms. This data could potentially allow to determine which DNA pieces are responsible for some visual change of indiviual, or for reactions to particular environment change. The main problem of such data is its huge number of features and relatively low amount of objects. Because of high-dimensional space, it is very hard to build a model which generalizes such data well. Furthermore, a lot of features in such datasets have nothing in common with results, so, they should be treated as noize. A∗ = 4 It seems to be logical in this case to select somehow the most relevant features and to learn a classiﬁer on these only. This idea is implemented in such area of machine learning as feature selection. There are three main methods of feature selection: ﬁlter selection based on statistical measures of every single feature or features subsets, wrapper selection based on subspace search with classiﬁer result as an optimization measure, and embedded selection that uses classiﬁcators inner properties [12]. c IFIP International Federation for Information Processing 2016 Published by Springer International Publishing Switzerland 2016. All Rights Reserved L. Iliadis and I. Maglogiannis (Eds.): AIAI 2016, IFIP AICT 475, pp. 341–347, 2016. DOI: 10.1007/978-3-319-44944-9 29

342

I. Isaev and I. Smetannikov

The main peculiarity of ﬁlter methods is their speed. This leads to the fact that they are frequently used for preprocessing, and resulting subsets of features further passed to other wrapper or embedded method. This is especially important for bioinformatics, where number of features in datasets is sometimes dozens and hundrends of thousands. These days, many machine learning algorithms use ensembling [1,4,8]. MeLiF algorithm [13] tries to apply this method to feature selection. It builds a linear combination of basic ﬁlters, that selects the most relevant features. MeLiF has a structural characteristic that it can be easily modiﬁed to work in concurrent or distributed manner. At this research, we implemented parallel version of MeLiF called MeLiF+ and achieved signiﬁcant speed improvement without losing in selection quality. The remainder of the paper is organized as follows: MeLiF algorit

Data Loading...

MeLiF+: Optimization of Filter Ensemble Algorithm with Parallel Computing

Recommend Documents

Parallel Computing in Optimization

Parallel Computing with Mathematica

Parallel Computing

Parallel Computing

Parallel Scientific Computing and Optimization Advances and Applicat

Track-Before-Detect Algorithm Based on Optimization Particle Filter

Data Assimilation The Ensemble Kalman Filter

Parallel Computing: Complexity Classes

A Parallel Tasks Scheduling Algorithm with Markov Decision Process in Edge Computing

Performance improvement of cloud security with parallel anarchies society optimization algorithm for virtual machine sel

GOPS: efficient RBF surrogate global optimization algorithm with high dimensions and many parallel processors including

Parallel Computing in C/C++