Impact of data preprocessing on cell-type clustering based on single-cell RNA-seq data

  • PDF / 1,820,175 Bytes
  • 13 Pages / 595.276 x 790.866 pts Page_size
  • 41 Downloads / 184 Views

DOWNLOAD

REPORT


METHODOLOGY ARTICLE

Open Access

Impact of data preprocessing on cell‑type clustering based on single‑cell RNA‑seq data Chunxiang Wang1, Xin Gao2* and Juntao Liu1* *Correspondence: [email protected]; [email protected] 1 School of Mathematics and Statistics, Shandong University (Weihai), Weihai 264209, China 2 Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia

Abstract  Background:  Advances in single-cell RNA-seq technology have led to great opportunities for the quantitative characterization of cell types, and many clustering algorithms have been developed based on single-cell gene expression. However, we found that different data preprocessing methods show quite different effects on clustering algorithms. Moreover, there is no specific preprocessing method that is applicable to all clustering algorithms, and even for the same clustering algorithm, the best preprocessing method depends on the input data. Results:  We designed a graph-based algorithm, SC3-e, specifically for discriminating the best data preprocessing method for SC3, which is currently the most widely used clustering algorithm for single cell clustering. When tested on eight frequently used single-cell RNA-seq data sets, SC3-e always accurately selects the best data preprocessing method for SC3 and therefore greatly enhances the clustering performance of SC3. Conclusion:  The SC3-e algorithm is practically powerful for discriminating the best data preprocessing method, and therefore largely enhances the performance of cell-type clustering of SC3. It is expected to play a crucial role in the related studies of single-cell clustering, such as the studies of human complex diseases and discoveries of new cell types. Keywords:  Preprocessing method, Single-cell RNA-seq data, Gene expression data, Single-cell clustering, SC3

Background Single-cell RNA sequencing (scRNA-seq) has revolutionized traditional transcriptomic studies by extracting the transcriptome information at the resolution of a single cell; therefore, this approach is able to detect heterogeneous information that cannot be obtained by sequencing mixed cells and to reveal the genetic structure and gene expression status of a single cell [1–7]. Moreover, it helps to identify new cell types [8, 9], provides new research ideas and opens up new directions for in-depth research on the occurrence, development mechanisms, diagnosis and treatment of complex diseases [10]. However, scRNA-seq generally results in a large amount of noise, and the capture efficiency is also much lower than that of traditional bulk RNA-seq, generating a very large number of dropouts, which gives rise to new challenges in single-cell data analysis © The Author(s) 2020. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, a