Cost-sensitive feature selection on multi-label data via neighborhood granularity and label enhancement

PDF / 2,660,290 Bytes
23 Pages / 595.224 x 790.955 pts Page_size
64 Downloads / 167 Views

Cost-sensitive feature selection on multi-label data via neighborhood granularity and label enhancement Xuandong Long1 · Wenbin Qian1,2 · Yinglong Wang1 · Wenhao Shu3 Accepted: 1 October 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Multi-label feature selection, which is an efficient and effective pre-processing step in machine learning and data mining, can select a feature subset that contains more contributions for multi-label classification while improving the performance of the classifiers. In real-world applications, an instance may be associated with multiple related labels with different relative importances, and the process of obtaining different features usually requires different costs, containing money, and time, etc. However, most existing works with regard to multi-label feature selection do not take into consideration the above two critical issues simultaneously. Therefore, in this paper, we exploit the idea of neighborhood granularity to enhance the traditional logical labels into label distribution forms to excavate the deeper supervised information hidden in multi-label data, and further consider the effect of the test cost under three different distributions, simultaneously. Motivated by these issues, a novel test cost multi-label feature selection algorithm with label enhancement and neighborhood granularity is designed. Moreover, the proposed algorithm is tested upon ten publicly available benchmark multi-label datasets with six widely-used metrics from two different aspects. Finally, two groups of experimental results demonstrate that the proposed algorithm achieves the satisfactory and superior performance over other four state-of-the-art comparing algorithms, and it is effective for improving the learning performance and decreasing the total test costs of the selected feature subset. Keywords Feature selection · Cost-sensitive · Label enhancement · Neighborhood granularity · Multi-label data

1 Introduction In recent decades, problems with multi-label feature selection have drawn widespread attention owing to plenty of noisy, redundant, and misleading features in various multi-label classification [1–3] tasks, such as image classification [4–6], text recognition [7–9], and gene function prediction [10–12]. The main purpose of feature selection, also called attribute reduction [13–15], is to select a more informative feature subset, containing relevant features and weakly relevant features (these non-noisy) from the original feature space directly. It can effectively Wenbin Qian

qianwenbin1027@126.com 1

School of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang 330045, China

2

School of Software, Jiangxi Agricultural University, Nanchang 330045, China

3

School of Information Engineering, East China Jiaotong University, Nanchang, 30013, China

maintain the original physical meanings for each selected feature and reduce the original dimensionality of the data, while speeding up the learning algorithm and improving the

Data Loading...

Cost-sensitive feature selection on multi-label data via neighborhood granularity and label enhancement

Recommend Documents

Feature selection based on maximal neighborhood discernibility

Multi-label feature selection via feature manifold learning and sparsity regularization

Unsupervised Hierarchical Feature Selection on Networked Data

Feature Selection for Handwritten Signature Recognition Using Neighborhood Component Analysis

Feature Selection for Data and Pattern Recognition

Robust multiview feature selection via view weighted

Partial Multi-label Learning with Label and Feature Collaboration

Incomplete label distribution learning based on supervised neighborhood information

Discriminative Feature Selection via Multiclass Variable Memory Markov Model

Information Granularity, Big Data, and Computational Intelligence

Hybrid Efficient Genetic Algorithm for Big Data Feature Selection Problems

Review on Deep Learning in Feature Selection