gCAnno: a graph-based single cell type annotation method

  • PDF / 2,136,992 Bytes
  • 10 Pages / 595.276 x 790.866 pts Page_size
  • 103 Downloads / 174 Views

DOWNLOAD

REPORT


METHODOLOGY ARTICLE

Open Access

gCAnno: a graph-based single cell type annotation method Xiaofei Yang1,2†, Shenghan Gao2,3†, Tingjie Wang2,3†, Boyu Yang2,3, Ningxin Dang4 and Kai Ye2,3,4,5*

Abstract Background: Current single cell analysis methods annotate cell types at cluster-level rather than ideally at single cell level. Multiple exchangeable clustering methods and many tunable parameters have a substantial impact on the clustering outcome, often leading to incorrect cluster-level annotation or multiple runs of subsequent clustering steps. To address these limitations, methods based on well-annotated reference atlas has been proposed. However, these methods are currently not robust enough to handle datasets with different noise levels or from different platforms. Results: Here, we present gCAnno, a graph-based Cell type Annotation method. First, gCAnno constructs cell typegene bipartite graph and adopts graph embedding to obtain cell type specific genes. Then, naïve Bayes (gCAnnoBayes) and SVM (gCAnno-SVM) classifiers are built for annotation. We compared the performance of gCAnno to other state-of-art methods on multiple single cell datasets, either with various noise levels or from different platforms. The results showed that gCAnno outperforms other state-of-art methods with higher accuracy and robustness. Conclusions: gCAnno is a robust and accurate cell type annotation tool for single cell RNA analysis. The source code of gCAnno is publicly available at https://github.com/xjtu-omics/gCAnno. Keywords: Graph embedding, Cell type annotation, Single cell RNA analysis

Background Bulk RNA sequencing measures average gene expression level in a large population of cells, hindering dissection of heterogeneous cell types [1]. In 2009, single cell RNA sequencing (scRNA-seq) technology was developed to provide valuable insights into cell heterogeneity [2]. In general, accurate cell type annotation for single cell data is a prerequisite for any further investigation of cell heterogeneous [3–6]. The commonly used cell type * Correspondence: [email protected] † Xiaofei Yang, Shenghan Gao and Tingjie Wang contributed equally to this work. 2 MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, China 3 School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, China Full list of author information is available at the end of the article

annotation methods, including Seurat [7], SCANPY [8] and SINCERA [9], adopts a similar procedure of data quality control, reads mapping, UMI quantification, expression normalization, clustering, differentially expressed genes (DEGs) of each cluster identification and cell type assignment based on biomarker genes [10]. However, those methods report cluster-level rather than truly single cell-level annotation results, masking subtle differences within each cluster. In addition, different clustering methods and ma