A new hybrid stability measure for feature selection

PDF / 2,246,403 Bytes
16 Pages / 595.224 x 790.955 pts Page_size
108 Downloads / 293 Views

A new hybrid stability measure for feature selection Akshata K. Naik1

· Venkatanareshbabu Kuppili1 · Damodar Reddy Edla1

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Feature Selection (FS) algorithms are applied in bioinformatics applications to identify the disease causing genes. Performance of such algorithms is measured in terms of accuracy of the model and stability of FS algorithms. Stability evaluates the identical replication of feature sets obtained after every execution. Recently research has shown that a stability measure must satisfy set of properties like, fully defined, monotonicity, boundedness, deterministic maximum stability, and correction for chance. Among the existing stability measures, only Nogueira’s frequency based stability measure satisfies all the required properties. However, frequency based stability measures fail to discriminate among the cases when overall frequency of features are same. In order to address this issue, the paper proposes a hybrid similarity based stability measure which satisfies all the desirable properties, as mentioned earlier. The proposed stability measure is unique as it is the first similarity based stability measure that satisfies all the required properties. Also, all these essential properties are mathematically established. Further, the paper also proposes a combination of frequency based and similarity based measure which preserves all the aspects of both the approaches. The work presented also analyzes the stability performance of LASSO and Elastic Net, using synthetic and microarray gene expression datasets. Elastic Net depicts higher stability and selection of relevant features. Keywords Feature evaluation and selection · Gene selection · Stability measure · Similarity-based stability · Frequency-based stability

1 Introduction High dimensional data usually leads to overfitting problem and high computational complexity in machine learning tasks. One such example of high dimensional dataset in bioinformatics, is the microarray gene expression dataset. Dimensionality reduction techniques have gained a great impetus as a solution to deal with such datasets. Dimensionality reduction methods can be broadly classified into two categories. The first category is the Feature Selection (FS), where a subset of relevant and nonredundant features is chosen from the original larger set of features. The second category is feature extraction, where the high dimensional data is mapped to lower dimension without preserving the actual feature set.

Akshata K. Naik

[email protected] 1

National Institute of Technology Goa, Farmagudi Ponda Goa, India

Recently, in bioinformatics applications, FS techniques are applied to select set of disease causing genes and the process is termed as gene selection [17]. The performance of gene selection algorithms are generally measured in terms of accuracy of learning model. It is worth pointing out that two different subsets of genes may give the same accuracy due to presence of highly corre

Data Loading...

A new hybrid stability measure for feature selection

Recommend Documents

Evolutionary Hybrid Feature Selection for Cancer Diagnosis

A parallel hybrid krill herd algorithm for feature selection

A hybrid feature selection scheme for mixed attributes data

A new feature selection using dynamic interaction

Hybrid Efficient Genetic Algorithm for Big Data Feature Selection Problems

Feature Selection Optimization Using a Hybrid Genetic Algorithm

Feature Selection for Clustering

A new ensemble feature selection approach based on genetic algorithm

A hybrid grasshopper and new cat swarm optimization algorithm for feature selection and optimization of multi-layer perc

A hybrid feature selection approach based on improved PSO and filter approaches for image steganalysis

A Hybrid Graph Centrality Based Feature Selection Approach for Supervised Learning

A novel hybrid approach for feature selection in software product lines