Feature selection based on maximal neighborhood discernibility
- PDF / 2,761,611 Bytes
- 12 Pages / 595.276 x 790.866 pts Page_size
- 46 Downloads / 248 Views
ORIGINAL ARTICLE
Feature selection based on maximal neighborhood discernibility Changzhong Wang1 · Qiang He2 · Mingwen Shao3 · Qinghua Hu4
Received: 16 December 2016 / Accepted: 7 August 2017 © Springer-Verlag GmbH Germany 2017
Abstract Neighborhood rough set has been proven to be an effective tool for feature selection. In this model, the positive region of decision is used to evaluate the classification ability of a subset of candidate features. It is computed by just considering consistent samples. However, the classification ability is not only related to consistent samples, but also to the ability to discriminate samples with different decisions. Hence, the dependency function, constructed by the positive region, cannot reflect the actual classification ability of a feature subset. In this paper, we propose a new feature evaluation function for feature selection by using discernibility matrix. We first introduce the concept of neighborhood discernibility matrix to characterize the classification ability of a feature subset. We then present the relationship between distance matrix and discernibility matrix, and construct a feature evaluation function based on discernibility matrix. It is used to measure the significance of a candidate feature. The proposed model not only maintains the maximal dependency function, but also can select features with the greatest discernibility ability. The experimental results show that the proposed method can be used to deal with
* Qiang He [email protected] 1
Department of Mathematics, Bohai university, Jinzhou 121000, People’s Republic of China
2
College of Science, Beijing University of Civil Engineering and Architecture, Beijing 100044, People’s Republic of China
3
College of Computer and Communication Engineering, Chinese University of Petroleum, Qingdao, Shandong 266580, People’s Republic of China
4
School of Computer Science and Technology, Tianjin University, Tianjin 300072, People’s Republic of China
heterogeneous data sets. It is able to find effective feature subsets in comparison with some existing algorithms. Keywords Feature selection · Neighborhood · Rough sets · Discernibility matrix
1 Introduction With the development of information technology, more and more features are acquired and stored in databases. There may be some features that are not closely related to a classification task. Irrelevant or redundant features can increase the risk of a classifier to over-fit training data and easily lead to poor generalization ability. Feature selection or attribute reduction, as an important technique for reducing redundant features, has attracted much attention in machine learning and pattern recognition. Feature evaluation is a key issue in feature selection. It has great impact on optimal feature selection. In general, different feature evaluation functions may lead to different optimal feature subsets. A good evaluation function is always related to high classification performance. Until now, a great number of evaluation functions have be
Data Loading...