Variational auto-encoder based Bayesian Poisson tensor factorization for sparse and imbalanced count data

  • PDF / 1,603,419 Bytes
  • 28 Pages / 439.37 x 666.142 pts Page_size
  • 73 Downloads / 176 Views

DOWNLOAD

REPORT


Variational auto-encoder based Bayesian Poisson tensor factorization for sparse and imbalanced count data Yuan Jin1 · Ming Liu2 · Yunfeng Li3 · Ruohua Xu3 · Lan Du1 · Longxiang Gao2 · Yong Xiang2 Received: 29 February 2020 / Accepted: 3 November 2020 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2020

Abstract Non-negative tensor factorization models enable predictive analysis on count data. Among them, Bayesian Poisson–Gamma models can derive full posterior distributions of latent factors and are less sensitive to sparse count data. However, current inference methods for these Bayesian models adopt restricted update rules for the posterior parameters. They also fail to share the update information to better cope with the data sparsity. Moreover, these models are not endowed with a component that handles the imbalance in count data values. In this paper, we propose a novel variational auto-encoder framework called VAE-BPTF which addresses the above issues. It uses multi-layer perceptron networks to encode and share complex update information. The encoded information is then reweighted per data instance to penalize common data values before aggregated to compute the posterior parameters for the latent factors. Under synthetic data evaluation, VAE-BPTF tended to recover the right number of latent factors and posterior parameter values. It also outperformed current models in both reconstruction errors and latent factor (semantic) coherence across five real-world datasets. Furthermore, the latent factors inferred by VAE-BPTF are perceived to be meaningful and coherent under a qualitative analysis. Keywords Non-negative tensor factorization · Variational auto-encoders · Neural networks · Latent variable modelling · Count data

Responsible editor: Sriraam Natarajan.

B

Yuan Jin [email protected]

1

Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia

2

School of Information Technology, Deakin University, Melbourne, VIC 3125, Australia

3

Sandstone Pty Ltd, 32-42 Barker St, Kingsford 2032, NSW, Australia

123

Y. Jin et al.

1 Introduction In this paper, we focus on improving the performance of Bayesian Poisson tensor factorization (BPTF). In terms of BPTF, it imposes Gamma distributions as priors over its latent factors. These factors then form the instance-wise rates for a Poisson likelihood over data observations. BPTF adopts two types of inference frameworks to compute the posterior shape and rate for its Gamma latent factors: Gibbs sampling and variational inference. Both of them rely on the auxiliary variable augmentation technique to facilitate their computation. This technique is based on the Poisson– Gamma conjugacy. It exploits the fact that a sum of auxiliary Poisson variables with respective rates is itself a Poisson with the rate equal to the sum of the auxiliaries’ rates. Despite its importance, the augmentation technique, however, increases the computation overhead due to the additional sampling procedures/upd