DNA sequencing: the key to unveiling genome
- PDF / 199,210 Bytes
- 4 Pages / 595.276 x 793.701 pts Page_size
- 99 Downloads / 217 Views
A sequencing: the key to unveiling genome *
Suhui Chen & Xuehui Huang
Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai 200234, China Received April 13, 2020; accepted April 24, 2020; published online May 20, 2020
Citation:
Chen, S., and Huang, X. (2020). DNA sequencing: the key to unveiling genome. Sci China Life Sci 63, https://doi.org/10.1007/s11427-020-1709-6
The genome, containing total genetic material in the organism, i.e., DNA, and RNA for some viruses, encodes the information needed for all life activity. Besides the DNA in cell nucleus, mitochondrial DNA and chloroplast DNA are also important components of the genome. Using high-throughput sequencing, a tremendous amount of genomic data has been obtained. Currently, 1,704 archaeal, 26,075 bacterial, 16,837 viral, and 4,688 eukaryotic genomes have been sequenced and submitted to the GenBank database (https:// www.ncbi.nlm.nih.gov/genome). These abundant sequences have greatly accelerated basic research, in areas such as gene function, genomic diversity and structure, and even life origins and evolution. This review summarizes current knowledge of genome structure and genomic evolution, and advanced sequencing technologies. Complexity and diversity of the genome. Genomes act as information storage systems, likened to electronic storage systems, that record variations among species. Typically, the number of genes required for function of an organism increases with its complexity. Nasuia deltocephalinicola, a kind of endosymbiotic bacteria, has only 137 coding genes and displays the smallest genome to the best of current knowledge (Bennett and Moran, 2013). In contrast, gene numbers of mammals can reach or exceed 25,000. In prokaryotes and small eukaryotes, a positive correlation exists between genome size and gene number (Hou and Lin, 2009). However, the ratio of genome size to gene number is not necessarily constant in eukaryotes, which is known as the Cvalue paradox. For instance, dinoflagellates are a large group of marine algae, and their nuclear genome varies from 1 to
270 Gb. The main cause of variations in genome size is the proportion of repetitive sequences (Ren et al., 2018). Repetitive sequences range in size from several bases (simple sequence repeats, e.g., “ATATATAT”) to millions of bases (large transposable elements), and account for over half of the human genome. Repeated sequences are categorized into moderately repetitive sequences and highly repetitive sequences based on copy numbers. Interspersed repeats (SINE and LINE) and partly tandem repeats (microsatellites and minisatellites) are two groups of moderately repetitive sequences. Alu repeat elements, the most abundant human SINE, compose approximately 11% of the human genome (Batzer and Deininger, 2002). Alu elements transpose within the human genome and are responsible for chromosome rearrangement during evolution. Satellite sequences are highly repeated and significant DNA components of heterochromatin. These repeat
Data Loading...