Dimensionality Reduction by Distance Covariance and Eigen Value Decomposition
Dimensionality reduction is the transformation of high-dimensional data into a meaningful representation of reduced dimension. In this paper, we investigate how much dimensionality reduction can be achieved with distance covariance and eigen value decompo
- PDF / 1,143,839 Bytes
- 11 Pages / 439.37 x 666.142 pts Page_size
- 81 Downloads / 167 Views
Abstract. Dimensionality reduction is the transformation of high-dimensional data into a meaningful representation of reduced dimension. In this paper, we investigate how much dimensionality reduction can be achieved with distance covariance and eigen value decomposition. The proposed method starts with data normalization, calculation of euclidean distances matrices of each attributes, and a recentralization followed by eigen value decomposition on the distance covariance matrix. We repeated experiment with a different normalization technique. Applied these reduction techniques on few public data sets, and results were compared with that of conventional Principal Component Analysis. The distance covariance matrix for dimension reduction yields better performance than that of PCA on the basis of the classification efficiency parameters. Keywords: Dimensionality reduction
Distance covariance
1 Introduction The advent of information technology and advances in data collection during the past few decades have led to an information overload or infobesity in most field of scientific research and development. Researchers working in diverse domains face larger and larger observations on a daily basis [1]. Such datasets, in contrast with smaller ones, face new and bigger challenges in data analysis because such an uncontrollable flood of data overwhelms us. The term curse of dimensionality or Hughes effect is used to refer various phenomena that arise when analyzing and organizing data in high-dimensional spaces [2]. Data processing with high dimensions incur higher computational overhead and complexity. Dimensionality reduction is of great need in such a situation because it increase efficiency, reduce measurement cost, storage and computation cost and it leads to ease of interpretation and modelling. It is useful in visualizing data and discovering a compact representation. Reducing the number of dimensions helps to separate the important features from the less important ones and provides additional understanding of the nature of data. Ideally, the reduced representation of data should have a
© Springer Nature Singapore Pte Ltd. 2017 M. Singh et al. (Eds.): ICACDS 2016, CCIS 721, pp. 112–122, 2017. DOI: 10.1007/978-981-10-5427-3_12
Dimensionality Reduction by Distance Covariance
113
dimensionality that corresponds to the intrinsic dimensionality of the data. The intrinsic dimensionality of data is the minimum number of parameters needed to account for the observed properties of the same [3].
2 Related Work Dimensionality reduction techniques can be broadly classified into linear and non-linear dimensionality reduction techniques. These methods of dimension reduction produce a low dimensional mapping of original high dimensional data that preserves some feature of interest in the data. Pearson in 1901 introduced the Principal Component Analysis (PCA), also known as Karhunen-Love transform, for dimension reduction [4]. PCA derive new variables that are linear combinations of original variables and which are uncorrelated.
Data Loading...