I-Impute: a self-consistent method to impute single cell RNA sequencing data
- PDF / 3,298,025 Bytes
- 9 Pages / 595 x 791 pts Page_size
- 83 Downloads / 183 Views
METHODOLOGY
Open Access
I-Impute: a self-consistent method to impute single cell RNA sequencing data Xikang Feng1,2† , Lingxi Chen2† , Zishuai Wang2 and Shuai Cheng Li2,3* From The 18th Asia Pacific Bioinformatics Conference Seoul, Korea. 18-20 August 2020
Abstract Background: Single-cell RNA-sequencing (scRNA-seq) is becoming indispensable in the study of cell-specific transcriptomes. However, in scRNA-seq techniques, only a small fraction of the genes are captured due to “dropout” events. These dropout events require intensive treatment when analyzing scRNA-seq data. For example, imputation tools have been proposed to estimate dropout events and de-noise data. The performance of these imputation tools are often evaluated, or fine-tuned, using various clustering criteria based on ground-truth cell subgroup labels. This limits their effectiveness in the cases where we lack cell subgroup knowledge. We consider an alternative strategy which requires the imputation to follow a “self-consistency” principle; that is, the imputation process is to refine its results until there is no internal inconsistency or dropouts from the data. Results: We propose the use of “self-consistency” as a main criteria in performing imputation. To demonstrate this principle we devised I-Impute, a “self-consistent” method, to impute scRNA-seq data. I-Impute optimizes continuous similarities and dropout probabilities, in iterative refinements until a self-consistent imputation is reached. On the in silico data sets, I-Impute exhibited the highest Pearson correlations for different dropout rates consistently compared with the state-of-art methods SAVER and scImpute. Furthermore, we collected three wetlab datasets, mouse bladder cells dataset, embryonic stem cells dataset, and aortic leukocyte cells dataset, to evaluate the tools. I-Impute exhibited feasible cell subpopulation discovery efficacy on all the three datasets. It achieves the highest clustering accuracy compared with SAVER and scImpute. Conclusions: A strategy based on “self-consistency”, captured through our method, I-Impute, gave imputation results better than the state-of-the-art tools. Source code of I-Impute can be accessed at https://github.com/ xikanfeng2/I-Impute. Keywords: scRNA-seq, Imputation, Self-consistency, Cell subpopulation identification
*Correspondence: [email protected] † Xikang Feng and Lingxi Chen contributed equally to this work. 2 Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China 3 Department of Biomedical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China Full list of author information is available at the end of the article © The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licen
Data Loading...