A novel feature learning framework for high-dimensional data classification

  • PDF / 2,706,794 Bytes
  • 15 Pages / 595.276 x 790.866 pts Page_size
  • 26 Downloads / 361 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

A novel feature learning framework for high‑dimensional data classification Yanxia Li2 · Yi Chai1,2 · Hongpeng Yin1,2   · Bo Chen2 Received: 19 November 2019 / Accepted: 19 August 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Feature extraction is an essential component in many classification tasks. Popular feature extraction approaches especially deep learning-based methods, need large training samples to achieve satisfactory performance. Although dictionary learningbased methods are successfully used for feature extraction on both small and large datasets, however, when dealing with high-dimensional datasets, a large number of dimensions also mask the discriminative information embedded in the data. To address these issues, a novel feature learning framework for high-dimensional data classification is proposed in this paper. Specially, to discard the irrelevant parts that derail the dictionary learning process, the dictionary is adaptively learnt in the low-dimensional space parameterized by a transformation matrix. To ensure that the learned features are discriminative for the classifier, the classification results in turn are used to guide the dictionary and transformation matrix learning process. Compared with other methods, the proposed method simultaneously exploits the dimension reduction, dictionary learning and classifier learning in one optimization framework, which enables the method to extract low-dimensional and discriminative features. Experimental results on several benchmark datasets demonstrate the superior performance of the proposed method for high-dimensional data classification task, particularly when the number of training samples is small. Keywords  High-dimensional data classification · Feature extraction · Dimension reduction · Dictionary learning

1 Introduction High-dimensional data is now ubiquitous in many domains, such as computer vision and bioinformatics [1, 2]. High dimensionality (usually several hundreds or thousands of dimensions) may produce the Hughes phenomenon, which can significantly reduce classification performance. Owing to accuracy consideration, numerous efforts have been made * Hongpeng Yin [email protected] Yanxia Li [email protected] Yi Chai [email protected] Bo Chen [email protected] 1



Power Transmission Equipment & System Security and New Technology, State Key Laboratory, Chongqing 400000, China



College of Automation, Chongqing University, Chongqing 40000, China

2

to produce good feature representation for high-dimensional classification task, by selecting [3–5] or extracting features  [6, 7] from original high-dimensional data. Existing feature extraction methods mainly fall into two categories: designing features manually and learning features from data directly  [8–10]. In general, designing features usually requires abundant engineering skills and domain expertise, which may limit their practical applications [11]. Learning features from data can overcome the limitations of hand-craft features a