Hash-tree PCA: accelerating PCA with hash-based grouping

PDF / 1,416,165 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
58 Downloads / 218 Views

Hash‑tree PCA: accelerating PCA with hash‑based grouping Lkhagvadorj Battulga1 · Sang‑Hyun Lee2 · Aziz Nasridinov3 · Kwan‑Hee Yoo3

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Abstract In data mining or machine learning, one of the most commonly used feature extraction techniques is principal component analysis (PCA). However, it performs poorly on a large dataset. In this paper, we propose a new method of accelerating conventional PCA, named hash-tree PCA. It samples the objects that are similar to each other without losing the original data distribution. First, it explores similar objects and stores them in hash tables. Afterward, it samples a certain number of the objects from each hash table and creates a new dataset with a reduced number of objects. Finally, it executes PCA on the sampled dataset. Experimental results show that our method outperforms the PCA and fast PCA methods. Keywords Fast PCA · Hash table · Eigenvalue decomposition (EVD) · Data grouping · Feature extraction

1 Introduction Recently, artificial intelligence (AI) has become an essential part of daily life due to its technological development [1]. However, algorithms that work behind AI do not scale up to high-dimensional datasets. Thus, conventional methods extract the most valuable variables from the dataset before AI algorithms are applied. This preprocessing step is * Kwan‑Hee Yoo [email protected] Lkhagvadorj Battulga [email protected] Sang‑Hyun Lee [email protected] Aziz Nasridinov [email protected] 1

Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands

2

YURA Co., Ltd, Seongnam, Gyeunggi, South Korea

3

Department of Computer Science, Chungbuk National University, Cheongju, South Korea

13

Vol.:(0123456789)

L. Battulga et al.

known as feature extraction or dimension reduction. One of the most commonly used feature extraction or dimension reduction techniques is principal component analysis (PCA). English mathematician Karl Pearson originated PCA theory in his research [2], and ever since, it has been used in various fields, from statistics to computer science. Generally, in computer science, PCA is used for data compression, dimension reduction, and image processing. Current systems and applications collect a large amount of data on a daily basis [3]. Executing PCA on this collected data would consume a massive amount of time on commodity hardware. Therefore, several researchers have proposed accelerating the execution time of PCA for large datasets. Nobuhiro and Kuroki [4] accelerated execution time of PCA with parallel programming by using a graphics processing unit (GPU) instead of a central processing unit (CPU). Vogt and Tacke [5] achieved PCA execution–time acceleration by using wavelet transformation. In their work, the authors claimed that if the data distribution is kept to the fewest number of objects, executing PCA on the reduced dataset is faster, and the result is identical to executing conventional PCA on the original

Data Loading...

Hash-tree PCA: accelerating PCA with hash-based grouping

Recommend Documents

PCA

PCA (Principal Component Analysis)

PCA: beobachtendes Abwarten nicht so beliebt

Defense Against Adversarial Attack Using PCA

Text Document Classification with PCA and One-Class SVM

Robust and fast image hashing with two-dimensional PCA

Generalized robust graph-Laplacian PCA and underwater image recognition

PCA-Based DDoS Attack Detection of SDN Environments

A faster tensor robust PCA via tensor factorization

Face Recognition Using PCA and Bit-Plane Slicing

An ADMM Approach for Constructing Abnormal Subspace of Sparse PCA

Deflated manifold embedding PCA framework via multiple instance factorings