A systematic evaluation of single-cell RNA-sequencing imputation methods

  • PDF / 4,118,932 Bytes
  • 30 Pages / 595 x 794 pts Page_size
  • 60 Downloads / 187 Views

DOWNLOAD

REPORT


RESEARCH

Open Access

A systematic evaluation of single-cell RNA-sequencing imputation methods Wenpin Hou, Zhicheng Ji, Hongkai Ji* and Stephanie C. Hicks* *Correspondence: [email protected]; [email protected] Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 North Wolfe Street, 21205 Baltimore, MD, USA

Abstract Background: The rapid development of single-cell RNA-sequencing (scRNA-seq) technologies has led to the emergence of many methods for removing systematic technical noises, including imputation methods, which aim to address the increased sparsity observed in single-cell data. Although many imputation methods have been developed, there is no consensus on how methods compare to each other. Results: Here, we perform a systematic evaluation of 18 scRNA-seq imputation methods to assess their accuracy and usability. We benchmark these methods in terms of the similarity between imputed cell profiles and bulk samples and whether these methods recover relevant biological signals or introduce spurious noise in downstream differential expression, unsupervised clustering, and pseudotemporal trajectory analyses, as well as their computational run time, memory usage, and scalability. Methods are evaluated using data from both cell lines and tissues and from both plateand droplet-based single-cell platforms. Conclusions: We found that the majority of scRNA-seq imputation methods outperformed no imputation in recovering gene expression observed in bulk RNA-seq. However, the majority of the methods did not improve performance in downstream analyses compared to no imputation, in particular for clustering and trajectory analysis, and thus should be used with caution. In addition, we found substantial variability in the performance of the methods within each evaluation aspect. Overall, MAGIC, kNN-smoothing, and SAVER were found to outperform the other methods most consistently. Keywords: Gene expression, Single-cell RNA-sequencing, Imputation, Benchmark

Background Recent advances in high-throughput technologies have been developed to measure gene expression in individual cells [1–5]. In contrast to bulk RNA-sequencing (RNA-seq), a distinctive feature of data measured using single-cell RNA-sequencing (scRNA-seq) is the increased sparsity, or fraction of observed “zeros,” where a zero refers to no unique molecular identifiers (UMIs) or reads mapping to a given gene in a cell [6–9]. These observed

© The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Common