Bayesian gamma-negative binomial modeling of single-cell RNA sequencing data

  • PDF / 1,854,886 Bytes
  • 10 Pages / 595 x 791 pts Page_size
  • 96 Downloads / 175 Views

DOWNLOAD

REPORT


RESEARCH

Open Access

Bayesian gamma-negative binomial modeling of single-cell RNA sequencing data Siamak Zamani Dadaneh1 , Paul de Figueiredo2,3,4 , Sing-Hoi Sze5 , Mingyuan Zhou6 and Xiaoning Qian1,7* From The Sixth International Workshop on Computational Network Biology: Modeling, Analysis, and Control (CNB-MAC 2019) Niagara Falls, NY, USA. 07 September 2019

Abstract Background: Single-cell RNA sequencing (scRNA-seq) is a powerful profiling technique at the single-cell resolution. Appropriate analysis of scRNA-seq data can characterize molecular heterogeneity and shed light into the underlying cellular process to better understand development and disease mechanisms. The unique analytic challenge is to appropriately model highly over-dispersed scRNA-seq count data with prevalent dropouts (zero counts), making zero-inflated dimensionality reduction techniques popular for scRNA-seq data analyses. Employing zero-inflated distributions, however, may place extra emphasis on zero counts, leading to potential bias when identifying the latent structure of the data. Results: In this paper, we propose a fully generative hierarchical gamma-negative binomial (hGNB) model of scRNA-seq data, obviating the need for explicitly modeling zero inflation. At the same time, hGNB can naturally account for covariate effects at both the gene and cell levels to identify complex latent representations of scRNA-seq data, without the need for commonly adopted pre-processing steps such as normalization. Efficient Bayesian model inference is derived by exploiting conditional conjugacy via novel data augmentation techniques. Conclusion: Experimental results on both simulated data and several real-world scRNA-seq datasets suggest that hGNB is a powerful tool for cell cluster discovery as well as cell lineage inference. Keywords: Single-cell RNA sequencing, Bayesian, Hierarchical modeling

Background Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for unbiased identification of previously uncharacterized molecular heterogeneity at the cellular level [1]. This is in contrast to standard bulk RNA-seq techniques [2], which measures average gene expression levels within a cell population, and thus ignore tissue heterogeneity. Consideration of cell-level variability of gene *Correspondence: [email protected] Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, USA 7 TEES-AgriLife Center for Bioinformatics & Genomic Systems Engineering, Texas A&M University, College Station, Texas, USA Full list of author information is available at the end of the article 1

expressions is essential for extracting signals from complex heterogeneous tissues [3], and also for understanding dynamic biological processes, such as embryo development [4] and cancer [5]. A large body of statistical tools developed for scRNAseq data analysis include a dimensionality reduction step. This leads to more tractable data, from both statistical and computational point of views. Moreover, the noise in the data can be