Cost-sensitive feature selection on multi-label data via neighborhood granularity and label enhancement

  • PDF / 2,660,290 Bytes
  • 23 Pages / 595.224 x 790.955 pts Page_size
  • 64 Downloads / 153 Views

DOWNLOAD

REPORT


Cost-sensitive feature selection on multi-label data via neighborhood granularity and label enhancement Xuandong Long1 · Wenbin Qian1,2 · Yinglong Wang1 · Wenhao Shu3 Accepted: 1 October 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Multi-label feature selection, which is an efficient and effective pre-processing step in machine learning and data mining, can select a feature subset that contains more contributions for multi-label classification while improving the performance of the classifiers. In real-world applications, an instance may be associated with multiple related labels with different relative importances, and the process of obtaining different features usually requires different costs, containing money, and time, etc. However, most existing works with regard to multi-label feature selection do not take into consideration the above two critical issues simultaneously. Therefore, in this paper, we exploit the idea of neighborhood granularity to enhance the traditional logical labels into label distribution forms to excavate the deeper supervised information hidden in multi-label data, and further consider the effect of the test cost under three different distributions, simultaneously. Motivated by these issues, a novel test cost multi-label feature selection algorithm with label enhancement and neighborhood granularity is designed. Moreover, the proposed algorithm is tested upon ten publicly available benchmark multi-label datasets with six widely-used metrics from two different aspects. Finally, two groups of experimental results demonstrate that the proposed algorithm achieves the satisfactory and superior performance over other four state-of-the-art comparing algorithms, and it is effective for improving the learning performance and decreasing the total test costs of the selected feature subset. Keywords Feature selection · Cost-sensitive · Label enhancement · Neighborhood granularity · Multi-label data

1 Introduction In recent decades, problems with multi-label feature selection have drawn widespread attention owing to plenty of noisy, redundant, and misleading features in various multi-label classification [1–3] tasks, such as image classification [4–6], text recognition [7–9], and gene function prediction [10–12]. The main purpose of feature selection, also called attribute reduction [13–15], is to select a more informative feature subset, containing relevant features and weakly relevant features (these non-noisy) from the original feature space directly. It can effectively  Wenbin Qian

[email protected] 1

School of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang 330045, China

2

School of Software, Jiangxi Agricultural University, Nanchang 330045, China

3

School of Information Engineering, East China Jiaotong University, Nanchang, 30013, China

maintain the original physical meanings for each selected feature and reduce the original dimensionality of the data, while speeding up the learning algorithm and improving the