Differential Expression Analysis in Single-Cell Transcriptomics

Differential expression analysis is an important aspect of bulk RNA sequencing (RNAseq). A lot of tools are available, and among them DESeq2 and edgeR are widely used. Since single-cell RNA sequencing (scRNAseq) expression data are zero inflated, single-c

  • PDF / 380,422 Bytes
  • 8 Pages / 504.567 x 720 pts Page_size
  • 21 Downloads / 410 Views

DOWNLOAD

REPORT


1

Introduction Single-cell sequencing is a powerful technology to study cell heterogeneity and represents a new frontier for the bioinformatics community. Cell heterogeneity analysis requires the use of clustering methods, (tSne [1], kernel similarity learning [2], etc.). However, after cluster detection there is the need of identifying genes playing a pivotal role in defining the cells’ cluster organization. Differential expression analysis could be the key to detect such genes. Many methods have been developed to identify differential gene expression from single-cell RNA (scRNA)-seq data, and, recently, Soneson evaluated the overall characteristics of 36 of them [3], testing their efficacy in the differential expression of two groups. Soneson has shown that bulk RNA-seq analysis methods do not perform worse than those developed specifically for

Valentina Proserpio (ed.), Single Cell Methods: Sequencing and Proteomics, Methods in Molecular Biology, vol. 1979, https://doi.org/10.1007/978-1-4939-9240-9_25, © Springer Science+Business Media, LLC, part of Springer Nature 2019

425

426

Luca Alessandrı` et al.

scRNA-seq [3]. However, the two-group comparison does not represent the optimal approach for intercluster features selection, that is, to identify the main players of cells subpopulation organization, and multigroup comparison would be more appropriate. Within the top ten best tools for differential expression analysis tested by Soneson, only Limma [4] and edgeR/QLF [5] are able to handle multigroup comparisons, and since among tools for two-group comparison edgeR/QLF appears to outperform other tools [3], in this chapter we will focus on edgeR/QLF as a tool for multigroup differential expression analysis. Another point that is important to address in bioinformatics analyses is reproducibility. Reproducibility of a research is a key element in the modern science and represents the ability of replicating an experiment independent of the location and the operator. Therefore, a study can be considered reproducible only if all the used data are available and the exploited computational analysis workflow is clearly described. In genomics and transcriptomics data analysis, the availability of raw data and list of tools used might/could not be enough to guarantee the reproducibility of the results obtained. Indeed, different releases of the same tools might result in sneaky reproducibility issues [6]. Reproducible Bioinformatics Project (RBP) [7] is an open-source project, based on docker images and R packages, providing reproducible results in the genomics and transcriptomics framework. In RBP it is available as an implementation of edgeR/QLF, and here we will describe its use as a differential expression tool for single cells.

2

Materials and Methods The analysis of transcription data generally requires the use of Unix operating system. Specifically, the RBP applications require the installation, in a UNIX-based environment, of a docker daemon (https://www.docker.com/) and of R (https://cran.r-project.org/). The