CoGAPS 3: Bayesian non-negative matrix factorization for single-cell analysis with asynchronous updates and sparse data

  • PDF / 968,948 Bytes
  • 6 Pages / 595.276 x 790.866 pts Page_size
  • 66 Downloads / 131 Views

DOWNLOAD

REPORT


Open Access

SOFTWARE

CoGAPS 3: Bayesian non‑negative matrix factorization for single‑cell analysis with asynchronous updates and sparse data structures Thomas D. Sherman1, Tiger Gao2 and Elana J. Fertig1,3,4* 

*Correspondence: [email protected] 1 Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD, USA Full list of author information is available at the end of the article

Abstract  Background:  Bayesian factorization methods, including Coordinated Gene Activity in Pattern Sets (CoGAPS), are emerging as powerful analysis tools for single cell data. However, these methods have greater computational costs than their gradient-based counterparts. These costs are often prohibitive for analysis of large single-cell datasets. Many such methods can be run in parallel which enables this limitation to be overcome by running on more powerful hardware. However, the constraints imposed by the prior distributions in CoGAPS limit the applicability of parallelization methods to enhance computational efficiency for single-cell analysis. Results:  We developed a new software framework for parallel matrix factorization in Version 3 of the CoGAPS R/Bioconductor package to overcome the computational limitations of Bayesian matrix factorization for single cell data analysis. This parallelization framework provides asynchronous updates for sequential updating steps of the algorithm to enhance computational efficiency. These algorithmic advances were coupled with new software architecture and sparse data structures to reduce the memory overhead for single-cell data. Conclusions:  Altogether our new software enhance the efficiency of the CoGAPS Bayesian matrix factorization algorithm so that it can analyze 1000 times more cells, enabling factorization of large single-cell data sets. Keywords:  Single cell, Matrix factorization, Pattern detection, Unsupervised learning

Background Non-negative matrix factorization (NMF) techniques have emerged as powerful tools to identify the cellular and molecular features that are associated with distinct biological processes from single cell data [3–5, 7, 14, 16]. Bayesian factorization approaches can mitigate local optima and leverage prior distributions to encode biological structure in the features [9, 12]. However, the computational cost of implementing these approaches may be prohibitive for large single cell datasets. Many NMF methods can be run in © The Author(s) 2020. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and