CSS: cluster similarity spectrum integration of single-cell genomics data
- PDF / 1,882,825 Bytes
- 21 Pages / 595.276 x 793.701 pts Page_size
- 80 Downloads / 181 Views
METHOD
Open Access
CSS: cluster similarity spectrum integration of single-cell genomics data Zhisong He1*, Agnieska Brazovskaja2, Sebastian Ebert2, J. Gray Camp3,4* and Barbara Treutlein1,2* * Correspondence: zhisong.he@bsse. ethz.ch; [email protected]; [email protected] 1 Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland 3 Institute of Molecular and Clinical Ophthalmology, Basel, Switzerland Full list of author information is available at the end of the article
Abstract It is a major challenge to integrate single-cell sequencing data across experiments, conditions, batches, time points, and other technical considerations. New computational methods are required that can integrate samples while simultaneously preserving biological information. Here, we propose an unsupervised reference-free data representation, cluster similarity spectrum (CSS), where each cell is represented by its similarities to clusters independently identified across samples. We show that CSS can be used to assess cellular heterogeneity and enable reconstruction of differentiation trajectories from cerebral organoid and other singlecell transcriptomic data, and to integrate data across experimental conditions and human individuals.
Background Recent advances in molecular, engineering, and sequencing technologies have enabled the high-throughput measurement of transcriptomes and other genomic features in thousands of single cells in a single experiment [1–4]. Single-cell RNA sequencing (scRNA-seq) greatly enhances our capacity to resolve the heterogeneity of cell types and cell states in biological samples, as well as to understand how systems change during dynamic processes such as development. However, current scRNA-seq technologies only provide molecular snapshots of a limited number of measured samples at a time. Joint analysis on many samples across multiple experiments and conditions is often required. In such a scenario, the biological variation of interest is usually confounded by other factors, including sample sources and experimental batches. This is particularly challenging for developing systems, where cell states coexist at different points along various differentiation trajectories such as mature cell types as well as intermediate states. Several computational integration methods, including but not limited to MNN [5], Seurat [6, 7], Harmony [8], LIGER [9], Scanorama [10], and Reference Similarity Spectrum (RSS) [11, 12], have been developed to address some of these issues. Among them, MNN identifies mutual nearest neighbors between two data sets and derives cell-specific batch-correction vectors for integration. Seurat corrects for batch effects by introducing an anchoring strategy, with anchors between samples © The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original au
Data Loading...