Identifying atypically expressed chromosome regions using RNA-Seq data
- PDF / 2,165,142 Bytes
- 31 Pages / 439.37 x 666.142 pts Page_size
- 10 Downloads / 179 Views
Identifying atypically expressed chromosome regions using RNA-Seq data Vinícius Diniz Mayrink1
· Flávio B. Gonçalves1
Accepted: 30 October 2019 © Springer-Verlag GmbH Germany, part of Springer Nature 2019
Abstract The number of studies dealing with RNA-Seq data analysis has experienced a fast increase in the past years making this type of gene expression a strong competitor to the DNA microarrays. This paper proposes a Bayesian model to detect low and highly-expressed chromosome regions using RNA-Seq data. The methodology is based on a recent work designed to detect highly-expressed (overexpressed) regions in the context of microarray data. A hidden Markov model is developed by considering a mixture of Gaussian distributions with ordered means in a way that first and last mixture components are supposed to accommodate the under and overexpressed genes, respectively. The model is flexible enough to efficiently deal with the highly irregular spaced configuration of the data by assuming a hierarchical Markov dependence structure. The analysis of four cancer data sets (breast, lung, ovarian and uterus) is presented. Results indicate that the proposed model is selective in determining the expression status, robust with respect to prior specifications and provides tools for a global or local search of under and overexpressed chromosome regions. Keywords Bayesian inference · Mixture model · Gibbs sampling · Gene expression · Cancer
1 Introduction Methods based on next-generation sequencing (NGS) to study the genome have flourished in the past years helping to understand changes in the transcriptome and leading to the discovery of new mutations and fusion genes. The identification of genes driving the cancer progression is, for example, the focus of extensive research using this technology (Maher et al. 2009; Berger et al. 2010; Han et al. 2011). In this paper, we are
B 1
Vinícius Diniz Mayrink [email protected] Departamento de Estatística, ICEx Universidade Federal de Minas Gerais, Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
123
V. D. Mayrink, F. B. Goncalves
particularly interested in the analysis of RNA-Seq (RNA sequencing) data quantifying the amount of RNA in a biological sample at a given moment in time (Wang et al. 2009; Oshlack et al. 2010; Chu and Corey 2012; Conesa et al. 2016). The RNA-Seq data can be used in different types of investigations including, for example, modifications in gene expression over time (Nueda et al. 2014; Van-De-Wiel et al. 2013) and (more often) a differential expression analysis comparing distinct conditions, groups or tissue types (Anders and Huber 2010; Bullard et al. 2010; Robinson et al. 2010; McCarthy et al. 2012; Soneson and Delorenzi 2013; Zhang et al. 2015; Papastamoulis and Rattray 2018). The work developed here is heavily based on the recent study reported in Mayrink and Gonçalves (2017), where the authors propose a Bayesian Markov mixture model to detect overexpressed regions on the chromosomes using microarray data. The model basically consists of
Data Loading...