From one linear genome to a graph-based pan-genome: a new era for genomics

  • PDF / 1,017,663 Bytes
  • 4 Pages / 595.276 x 793.701 pts Page_size
  • 96 Downloads / 210 Views

DOWNLOAD

REPORT


om one linear genome to a graph-based pan-genome: a new era for genomics Yucheng Liu 1

1,2

& Zhixi Tian

1,2*

State Key Laboratory of Plant Cell and Chromosome Engineering, Center for Genome Editing, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing 100101, China; 2 University of Chinese Academy of Sciences, Beijing 100049, China Received July 31, 2020; accepted August 25, 2020; published online September 7, 2020

Citation:

Liu, Y., and Tian, Z. (2020). From one linear genome to a graph-based pan-genome: a new era for genomics. Sci China Life Sci 63, https://doi.org/ 10.1007/s11427-020-1808-0

The seeds for genomics were sown with the development of DNA sequencing (Sanger et al., 1977), cultivated with each advance in molecular biology, and have since grown into one of the most important aspects of the life sciences. With the sequencing of the first draft human genome (Venter et al., 2001), the era of post-genomics began. Once one genome has been sequenced, it can be used as the reference for a certain species, and sequences from individuals can be mapped onto it to compare genetic variations among different lines. This allows further population studies, such as whole genome genotyping, molecular evolution, and identification of traitconferring loci. Construction of a reference genome has become a prerequisite for deeper functional analyses. However, the increasing number of genomics and genetics studies have shown us that a reference genome cannot fully represent the entire genetic variation of a species. Instead, we must consider to what extent a reference genome is representative (Ballouz et al., 2019; Yang et al., 2019). In this insight, we discuss the trends and challenges of constructing reference genomes (Figure 1). From one genome to a pan-genome. In genomic studies, we always find a significant reduction in overall nucleotide diversity between the genomes of the domesticated species and the wild species. However, this reduction in diversity coexists with more divergent phenotypes for some traits in the cultivated species. Moreover, larger structural variants

(SVs), which are rarely detectable by mapping short reads against one reference genome, play important roles in conferring some agronomical trait variations (Golicz et al., 2016a; Sherman and Salzberg, 2020). The use of just one or a few reference genomes in functional genomic studies may underestimate the genetic divergence and miss some key variants (Danilevicz et al., 2020). Therefore, there is a need to move towards pan-genome construction (Ameur, 2019; Golicz et al., 2016a; Sherman and Salzberg, 2020). The prefix pan- comes from Greek and means “all” or “everything”. A pan-genome tries to encompass all sequenced genomes of a species to better represent the diverse regions within the genome. The first pan-genome was constructed by sequencing the genomes of 8 Streptococcus agalactiae strains (Tettelin et al., 2005). Through comparative genomic analyses, the pan-genome was del