MeLiF+: Optimization of Filter Ensemble Algorithm with Parallel Computing
Search of algorithms ensemble – that is, best algorithms combination is common used approach in machine learning. MeLiF algorithm uses this technique for filter feature selection. In our research we proposed parallel version of this algorithm and showed t
- PDF / 209,055 Bytes
- 7 Pages / 439.37 x 666.142 pts Page_size
- 77 Downloads / 213 Views
Abstract. Search of algorithms ensemble – that is, best algorithms combination is common used approach in machine learning. MeLiF algorithm uses this technique for filter feature selection. In our research we proposed parallel version of this algorithm and showed that it is not only improves algorithm performance significantly, but also improves feature selection quality. Keywords: Feature selection · Variable selection · Attribute selection · Ensemble learning · Feature filters · Metrics aggregation · MeLiF · Parallel computing
1
Introduction
In modern world, machine learning became one of the most promising and studied science areas, mainly, because of its universal application to any data-related problem. One example of such an area is bioinformatics [3,4,6,10], which produces giant amount of data about gene expression of different organisms. This data could potentially allow to determine which DNA pieces are responsible for some visual change of indiviual, or for reactions to particular environment change. The main problem of such data is its huge number of features and relatively low amount of objects. Because of high-dimensional space, it is very hard to build a model which generalizes such data well. Furthermore, a lot of features in such datasets have nothing in common with results, so, they should be treated as noize. A∗ = 4 It seems to be logical in this case to select somehow the most relevant features and to learn a classifier on these only. This idea is implemented in such area of machine learning as feature selection. There are three main methods of feature selection: filter selection based on statistical measures of every single feature or features subsets, wrapper selection based on subspace search with classifier result as an optimization measure, and embedded selection that uses classificators inner properties [12]. c IFIP International Federation for Information Processing 2016 Published by Springer International Publishing Switzerland 2016. All Rights Reserved L. Iliadis and I. Maglogiannis (Eds.): AIAI 2016, IFIP AICT 475, pp. 341–347, 2016. DOI: 10.1007/978-3-319-44944-9 29
342
I. Isaev and I. Smetannikov
The main peculiarity of filter methods is their speed. This leads to the fact that they are frequently used for preprocessing, and resulting subsets of features further passed to other wrapper or embedded method. This is especially important for bioinformatics, where number of features in datasets is sometimes dozens and hundrends of thousands. These days, many machine learning algorithms use ensembling [1,4,8]. MeLiF algorithm [13] tries to apply this method to feature selection. It builds a linear combination of basic filters, that selects the most relevant features. MeLiF has a structural characteristic that it can be easily modified to work in concurrent or distributed manner. At this research, we implemented parallel version of MeLiF called MeLiF+ and achieved significant speed improvement without losing in selection quality. The remainder of the paper is organized as follows: MeLiF algorit
Data Loading...