Lineage Inference and Stem Cell Identity Prediction Using Single-Cell RNA-Sequencing Data

With the advent of several single-cell RNA-sequencing (scRNA-seq) techniques, it has become possible to gain novel insights into the fundamental long-standing questions in biology with an unprecedented resolution. Among the various applications of scRNA-s

  • PDF / 3,585,873 Bytes
  • 25 Pages / 504.567 x 720 pts Page_size
  • 35 Downloads / 228 Views

DOWNLOAD

REPORT


1

Introduction During embryonic development as well as adult tissue homeostasis and regeneration, cells move from an immature naı¨ve state to a mature functional state through a series of complex transcriptional changes governed by various cell-intrinsic and cell-extrinsic factors. Quantification of genome-wide transcription of individual cells using scRNA-seq provides a unique opportunity to capture and characterize distinct cell states occurring during such biological processes. Moreover, since the process of differentiation is not completely synchronous in complex multicellular organisms, single-cell expression data often comprises of not only immature and mature cell types but also transient cell types with unique transcriptional signature representing the intermediate states of cellular differentiation. Therefore, statistical methods and computational tools de novo or in combination with prior biological

Patrick Cahan (ed.), Computational Stem Cell Biology: Methods and Protocols, Methods in Molecular Biology, vol. 1975, https://doi.org/10.1007/978-1-4939-9224-9_13, © Springer Science+Business Media, LLC, part of Springer Nature 2019

277

278

Sagar and Dominic Gru¨n

knowledge can be used to predict the identity of stem cells and infer the lineage tree by aligning the intermediate cell states along a trajectory, thereby tracking the global gene expression changes occurring during the process of differentiation. During the last few years, several methods have been developed for lineage inference and stem cell identification from scRNA-seq data [1–7]. Here we describe our in-house step-by-step workflow of relevant aspects of previously published RaceID3, StemID2, and FateID algorithms including the sample R-codes for de novo lineage tree construction and stem cell identity prediction, thereby providing important tools to facilitate scRNA-seq-driven discoveries.

2

Materials The analysis workflow described in this protocol is implemented in R. Analyses can be performed either on a private workstation or compute servers [8]. 1. R (https://www.r-project.org/) and RStudio (optional, https://www.rstudio.com/). 2. RaceID3 and StemID2 algorithms (https://github.com/ dgrun/RaceID3_StemID2). The link contains two R files: RaceID3_StemID2_class.R and RaceID3_StemID2_sample.R as well as an example data in .xls format. 3. FateID algorithm available as an R package through CRAN package repository.

3

Methods This section provides a step-by-step workflow using a previously published dataset where 5-day-old progeny of Lgr5+ mouse intestinal cells was sequenced [9]. In order to identify stem cells in the scRNA-seq data, first robust identification of all cell types in the dataset is necessary. This is done by applying the RaceID3 algorithm, which performs k-medoids clustering and a subsequent outlier identification step to recover all cell types, including the rare ones. Afterward, the StemID2 algorithm is used to construct a lineage tree. The StemID2 algorithm is also used to identify the putative stem/progenitor cell clu