A matrix factorization approach to graph compression with partial information

  • PDF / 623,710 Bytes
  • 14 Pages / 595.276 x 790.866 pts Page_size
  • 20 Downloads / 208 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

A matrix factorization approach to graph compression with partial information Farshad Nourbakhsh • Samuel Rota Bulo` Marcello Pelillo



Received: 21 January 2014 / Accepted: 1 July 2014 Ó Springer-Verlag Berlin Heidelberg 2014

Abstract We address the problem of encoding a graph of order n into a graph of order k\n in a way to minimize reconstruction error. This encoding is characterized in terms of a particular factorization of the adjacency matrix of the original graph. The factorization is determined as the solution of a discrete optimization problem, which is for convenience relaxed into a continuous, but equivalent, one. Our formulation does not require to have the full graph, but it can factorize the graph also in the presence of partial information. We propose a multiplicative update rule for the optimization task resembling the ones introduced for nonnegative matrix factorization, and convergence properties are proven. Experiments are conducted to assess the effectiveness of the proposed approach. Keywords Matrix factorization  Graph compression  Stochastic blockmodel

1 Introduction In the recent years, matrix factorization approaches have become an important tool in machine learning and found successful applications for tasks like, e.g. information F. Nourbakhsh  M. Pelillo DAIS, Universita` Ca’ Foscari Venezia, via Torino, 155, 30172 Venezia Mestre, Italy e-mail: [email protected] M. Pelillo e-mail: [email protected] S. Rota Bulo` (&) Fondazione Bruno Kessler, via Sommarive, 18, 38123 Povo, Trento, Italy e-mail: [email protected]

retrieval, data mining and pattern recognition. Well-known techniques based on matrix factorization are singular value decomposition, principal component analysis, latent semantic analysis, nonnegative matrix factorization and concept factorization. Singular value decomposition (SVD) [14] decomposes without loss of information any data matrix (not necessarily square) into the product of two unitary matrices (left and right factors) and a nonnegative diagonal matrix (central factor). Principal component analysis (PCA) [16] is a statistical tool that determines an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values from linearly uncorrelated variables, known as principal components; the principal components can actually be recovered through the SVD factorization. Latent semantic analysis (LSA) [6] is another tool developed in the context of natural language processing that is based on the SVD factorization; this technique aims at recovering hidden concepts from a typically sparse co-occurrence matrix of, e.g. words and documents in the context of document analysis. Further developments of LSA gave rise to the probabilistic latent semantic analysis (pLSA) [12], which is based on a mixture decomposition of the latent class model, and to latent Dirichlet allocation (LDA) [3], which is a more general generative model where observations are explained by latent groups capturing the similarity stru