Matrix factorization of large scale data using multistage matrix factorization

PDF / 1,642,330 Bytes
13 Pages / 595.224 x 790.955 pts Page_size
19 Downloads / 296 Views

Matrix factorization of large scale data using multistage matrix factorization Prasad Bhavana1 · Vineet Padmanabhan1 Accepted: 17 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Matrix Factorization (MF) is a resource intensive task that consumes significant memory and computational effort and is not scalable with the quantum of data. When the size of the input matrix and the latent feature matrices are higher than the available memory, both on a Central Processing Unit (CPU) as well as a Graphical Processing Unit (GPU), loading all the required matrices on to CPU/GPU memory may not be possible. Such scenarios call for alternative techniques that not only allow parallelism but also address memory limitations and plays a crucial role in industrial applications. In this paper we propose a divide and conquer technique based on a two stage factorization process. In the first step, we divide the data set into different groups and factorize each group. In the second step, we use factorization based learning model to combine the latent features derived in the first step. Our motivation is to develop a method that can achieve both parallelism and scalability as well as address factorization of incrementally growing data. Our contribution is a novel multi-stage matrix factorization (MsMF) approach. The experimental results demonstrate improvements in RMSE as well as computational efficiency. Keywords Multistage matrix factorization · Two-stage matrix factorization · Hierarchical matrix factorization

1 Introduction In data analysis, a simplified and meaningful representation of the underlying patterns is of top priority. Discovering the structure, relationship within the data or attributes and retrieving the latent information that exists in the original data matrix has been an area of interest in data analysis. Matrix factorization (MF) is a well-known technique leveraged for discovering the latent dimensions of the observed data. It has received considerable attention in several domains including recommender systems [1], text mining [2], computer vision [3], medical diagnosis [4], dimensionality reduction [5] and data compression. In most of the matrix factorization based models, it is assumed that the data has been generated from some latent distributions and the goal is to estimate the underlying factors under some loss measure, which in turn leads to an approximation

of the hidden behaviours. Here, the interpretation of latent factors are application dependent. For example, in the recommender systems domain, the observed entries are the ratings given by users for the items and it is assumed that a rating is generated based on the similarity between the latent features of a user and an item. Formally, given a partially observed data matrix X ∈ Rm×n and its observed entry set , the task of matrix factorization is to learn the latent factor matrices U ∈ Rm×k and V ∈ Rn×k such that X ≈ U V T . Here, k is a user defined parameter that signifies the dimension of latent represen

Data Loading...

Matrix factorization of large scale data using multistage matrix factorization

Recommend Documents

Matrix-Valued Factorization Identities

Nonparametric Bayesian Nonnegative Matrix Factorization

Bayesian mean-parameterized nonnegative binary matrix factorization

Nonnegative Residual Matrix Factorization for Community Detection

Movie Rating Prediction with Matrix Factorization Algorithm

Statistical 3D watermarking algorithm using non negative matrix factorization

Randomized Algorithms for Orthogonal Nonnegative Matrix Factorization

Learning Hidden Markov Models Using Probabilistic Matrix Factorization

Dual-Transform Source Separation Using Sparse Nonnegative Matrix Factorization

Matrix Factorization Based Heuristics Learning for Solving Constraint Satisfaction Problems

Deep Matrix Factorization on Graphs: Application to Collaborative Filtering

A Model-Bias Matrix Factorization Approach for Course Score Prediction