Kernelized fuzzy rough sets based online streaming feature selection for large-scale hierarchical classification

PDF / 1,557,790 Bytes
14 Pages / 595.224 x 790.955 pts Page_size
17 Downloads / 315 Views

Kernelized fuzzy rough sets based online streaming feature selection for large-scale hierarchical classiﬁcation Shengxing Bai1,2 · Yaojin Lin1,2 · Yan Lv1,2 · Jinkun Chen3 · Chenxi Wang1

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract In recent years, many online streaming feature selection approaches focus on flat data, which means that all data are taken as a whole. However, in the era of big data, not only the feature space of data has unknown and evolutionary characteristics, but also the label space of data exists hierarchical structure. To address this problem, an online streaming feature selection framework for large-scale hierarchical classification task is proposed. The framework consists of three parts: (1) a new hierarchical data-oriented kernelized fuzzy rough model with sibling strategy is constructed, (2) the online important feature is selected based on feature correlation analysis, and (3) the online redundant feature is deleted based on feature redundancy. Finally, an empirical study using several hierarchical classification data sets manifests that the proposed method outperforms other state-of-the-art online streaming feature selection methods. Keywords Online feature selection · Hierarchical classification · Kernelized fuzzy rough sets · Sibling strategy

1 Introduction Hierarchies Taxonomies are popular for organizing large volume data sets in various application domains [9, 15]. For example, ImageNet is an image database organized refer to the WordNet hierarchy (currently only the nouns), in which hundreds and thousands of images are used to depict each node of the hierarchy. It also has been used in many areas including biology data [9], Wikipedia [24], geographical data [39], and text data [3, 6, 44]. Therefore, large-scale hierarchical classification learning is an important and popular learning paradigm in machine learning and data mining communities [9, 15]. From the viewpoint of biologists, the discovery of new species is attributed to the new features detected. Furthermore, these new features are now available in the Yaojin Lin

[email protected] 1

School of Computer Science, Minnan Normal University, Zhangzhou, 363000, People’s Republic of China

2

Laboratory of Data Science, Intelligence Application, Minnan Normal University, Zhangzhou, 363000, People’s Republic of China

3

School of Mathematics and Statistics, Minnan Normal University, Zhangzhou, 363000, People’s Republic of China

existed species [50]. Therefore, the challenge of hierarchical classification learning is that the full feature space is unknown before learning begins. As we know, the full feature space determines the final label category of the samples. For example, in the diagnosis of lung cancer, through clinical testing in a period, doctors can gradually obtain clinical signs of lung cancer patients. Further, these patients may need to be diagnosed with small cell lung cancer, which is the subcategory of lung cancer. This phenomenon suggests that it is infeasible to collect all

Data Loading...

Kernelized fuzzy rough sets based online streaming feature selection for large-scale hierarchical classification

Recommend Documents

A novel classification algorithm based on kernelized fuzzy rough sets

Rough Set-Based Feature Selection Techniques

Fuzzy Sets, Rough Sets, Multisets and Clustering

Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing 9th

Online Invariance Selection for Local Feature Descriptors

Intra-cluster Similarity Index Based on Fuzzy Rough Sets for Fuzzy C-Means Algorithm

Fuzzy ELM for classification based on feature space

Unsupervised Hierarchical Feature Selection on Networked Data

Fuzzy Classification of Online Customers

Hierarchical classification with multi-path selection based on granular computing

A strong intuitionistic fuzzy feature association map-based feature selection technique for high-dimensional data

Uncertainty Management with Fuzzy and Rough Sets Recent Advances and