An information-theoretic graph-based approach for feature selection
- PDF / 811,969 Bytes
- 9 Pages / 595.276 x 790.866 pts Page_size
- 43 Downloads / 233 Views
Indian Academy of Sciences Sadhana(0123456789().,-volV)FT3 ](0123456789().,-volV)
An information-theoretic graph-based approach for feature selection AMIT KUMAR DAS1, SAHIL KUMAR1, SAMYAK JAIN1, SAPTARSI GOSWAMI2, AMLAN CHAKRABARTI2 and BASABI CHAKRABORTY3,* 1
Department of Computer Science and Engineering, Institute of Engineering and Management, Kolkata, India A K Choudhury School of Information Technology, University of Calcutta, Kolkata, India 3 Iwate Prefectural University, Takizawa, Japan e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected] 2
MS received 1 January 2019; revised 18 April 2019; accepted 17 October 2019 Abstract. Feature selection is a critical research problem in data science. The need for feature selection has become more critical with the advent of high-dimensional data sets especially related to text, image and microarray data. In this paper, a graph-theoretic approach with step-by-step visualization is proposed in the context of supervised feature selection. Mutual information criterion is used to evaluate the relevance of the features with respect to the class. A graph-based representation of the input data set, named as feature information map (FIM) is created, highlighting the vertices representing the less informative features. Amongst the more informative features, the inter-feature similarity is measured to draw edges between features having high similarity. At the end, minimal vertex cover is applied on the connected vertices to identify a subset of features potentially having less similarity among each other. Results of the experiments conducted with standard data sets show that the proposed method gives better results than the competing algorithms for most of the data sets. The proposed algorithm also has a novel contribution of rendering a visualization of features in terms of relevance and redundancy. Keywords. graph.
Graph-based feature selection; supervised learning; mutual information; vertex cover; feature
1. Introduction Twenty-first century has ushered in an era of data science and data-driven thinking [1]. There is a huge explosion of data available for analysis in every domain [2]. With this, there has been a rapid increase of dimensionality of the data sets used in data-science-related activities and feature selection has become extremely critical [3]. Feature selection is an important pre-processing step in machine learning, which helps in selecting a subset of features from the entire feature set. If the entire feature set is used, especially for the data sets with higher number of features, the computation cost of executing machine learning tasks will be very high. Depending on whether the value of target variable is available or not, feature selection may be supervised or unsupervised. For supervised feature selection, relative importance of the features can be determined with respect to target variable. Mutual information (MI) between two random variables measures the a
Data Loading...