Data-Driven Statistical Approaches for Omics Data Analysis
With the rapid development of high-throughput technology, various omics data for biological systems increases exponentially. A challenge problem for biologists is how to explore useful bioinformatics from high-dimensional or ultrahigh-dimensional omics da
- PDF / 872,186 Bytes
- 31 Pages / 439.36 x 666.15 pts Page_size
- 96 Downloads / 227 Views
Data-Driven Statistical Approaches for Omics Data Analysis
Abstract With the rapid development of high-throughput technology, various omics data for biological systems increases exponentially. A challenge problem for biologists is how to explore useful bioinformatics from high-dimensional or ultrahigh-dimensional omics data. In this chapter, we introduce some recent progresses on the topic of omics data analysis, paying special attention on the related data-driven statistical approaches. Especially, the weighted gene co-expression network analysis, the genome-wide association study, the general linear models, and the hidden Markov random field model will be introduced.
9.1 Backgrounds 9.1.1 Various High-Throughput Sequencing Technologies With the rapid development of high-throughput technology, various omics data for biological systems increases exponentially [1–17]. The human genome sequence was completed in draft form in 2001 [18, 19]. Shortly thereafter, the genome sequences of several model organisms were determined [20–22]. These feats were accomplished with Sanger DNA sequencing, which was limited in throughput and high cost. Commercially available high-throughput sequencing (HTS) platforms (Figs. 9.1 and 9.2) that have improved the traditional Sanger sequencing include the following: (1) The Illumina Genome Analyzer II that was released by Illumina/Solexa in 2006. Illumina currently has produced a suite of sequencers (MiSeq, NextSeq 500, and the HiSeq series) optimized for a variety of throughputs and turnaround times. In early 2014, Illumina introduced the NextSeq 500 as well as the HiSeq X Ten. The NextSeq 500 is designed as a fast benchtop sequencer for individual labs, while the HiSeq X Ten is a population-scale whole-genome sequencing (WGS) system. (2) Life Technologies commercialized Ion Torrent’s semiconductor sequencing technology in 2010 in the form of the benchtop Ion PGM sequencer. The template preparation and sequencing steps are conceptually similar to the Roche/454 pyrosequencing platform [23]. (3) Single-molecule real-time (SMRT) sequencing © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 J. Lü, P. Wang, Modeling and Analysis of Bio-molecular Networks, https://doi.org/10.1007/978-981-15-9144-0_9
429
430
9 Data-Driven Statistical Approaches for Omics Data Analysis
10,000,000
1,000,000 Complete Genomics
Machine output (Mb)
125
100
35 ABI SOLiD Intelligent Illumina 5500xl Bio-Systems MAX-Seq GAIIx 150 Illumina 75 55 GAII 50 35 32 ABI SOLiD 3
100,000 30x human genome
100x human exome
ABI SOLiD
13 300 Polonator Illumina G.007 Ion Torrent MiSeq Ion PGM Roche/454 400 GS FLX+ 14k 800 Pacific Bioscience RSII
35 35
1,000
Solexa/Illumina sequence analyzer
100
150
150 ABI SOLiD 5500xl W Illumina HiSeq 3000 75 150 Illumina NextSeq 500
Helicos Heliscope
10,000
Illumina HiSeq X Ten
Illumina HiSeq 2500
Illumina Hi-Seq 2000
200 Ion Torrent Ion Proton
Oxford Nanopore MinION
Roche/454 GS Junior
454 GS-20 pyroseque
Data Loading...