Two directional Laplacian pyramids with application to data imputation

  • PDF / 2,629,260 Bytes
  • 24 Pages / 439.642 x 666.49 pts Page_size
  • 39 Downloads / 165 Views

DOWNLOAD

REPORT


Two directional Laplacian pyramids with application to data imputation Neta Rabin1

· Dalia Fishelov1

Received: 26 September 2018 / Accepted: 3 April 2019 / © Springer Science+Business Media, LLC, part of Springer Nature 2019

Abstract Modeling and analyzing high-dimensional data has become a common task in various fields and applications. Often, it is of interest to learn a function that is defined on the data and then to extend its values to newly arrived data points. The Laplacian pyramids approach invokes kernels of decreasing widths to learns a given dataset and a function defined over it in a multi-scale manner. Extension of the function to new values may then be easily performed. In this work, we extend the Laplacian pyramids technique to model the data by considering two-directional connections. In practice, kernels of decreasing widths are constructed on the row-space and on the column space of the given dataset and in each step of the algorithm the data is approximated by considering the connections in both directions. Moreover, the method does not require solving a minimization problem as other common imputation techniques do, thus avoids the risk of a non-converging process. The method presented in this paper is general and may be adapted to imputation tasks. The numerical results demonstrate the ability of the algorithm to deal with a large number of missing data values. In addition, in most cases, the proposed method generates lower errors compared to existing imputation methods applied to benchmark dataset. Keywords Laplacian pyramids · RNA sequencing data · Two-sided LP scheme · Imputation Mathematics Subject Classification (2010) 68T30

Communicated by: Pavel Solin  Neta Rabin

[email protected]; [email protected] Dalia Fishelov [email protected]; [email protected] 1

Afeka - Tel Aviv Academic College of Engineering, 38 Bnei Efraim St., Tel Aviv, Israel

N. Rabin, D. Fishelov

1 Introduction Modeling and analyzing high-dimensional data has become a common task in various fields and applications. Kernel-based machine learning methods are capable of generating compact models that capture underling important features of the complex dataset. Typically, given a dataset X of size M × N, a kernel is constructed based on the rows of X. This kernel captures the pairwise distances between the rows of X. In classification algorithms, such as SVM (Support Vector Machines), kernels are used for finding non-linear separations between data classes. In addition, non-linear dimensionality reduction algorithms, such as diffusion maps [6], utilize spectral decomposition of normalized kernels for embedding high-dimensional data. Recent work [7] proposed dual geometry approaches that embed the dataset X in a low-dimensional space using non-linear dimensionality reduction techniques that are consequently applied to rows and columns of X. Another method that utilizes kernels, which is extended in this paper, is Laplacian pyramids [11, 28]. Laplacian pyramids is a multi-scale algorithm for learning functions over s