Estimate of the sequenced proportion of the global prokaryotic genome

  • PDF / 1,543,772 Bytes
  • 9 Pages / 595.276 x 790.866 pts Page_size
  • 82 Downloads / 215 Views

DOWNLOAD

REPORT


RESEARCH

Open Access

Estimate of the sequenced proportion of the global prokaryotic genome Zheng Zhang1,2*† , Jianing Wang1†, Jinlan Wang3, Jingjing Wang1 and Yuezhong Li1*

Abstract Background: Sequencing prokaryotic genomes has revolutionized our understanding of the many roles played by microorganisms. However, the cell and taxon proportions of genome-sequenced bacteria or archaea on earth remain unknown. This study aimed to explore this basic question using large-scale alignment between the sequences released by the Earth Microbiome Project and 155,810 prokaryotic genomes from public databases. Results: Our results showed that the median proportions of the genome-sequenced cells and taxa (at 100% identities in the 16S-V4 region) in different biomes reached 38.1% (16.4–86.3%) and 18.8% (9.1–52.6%), respectively. The sequenced proportions of the prokaryotic genomes in biomes were significantly negatively correlated with the alpha diversity indices, and the proportions sequenced in host-associated biomes were significantly higher than those in free-living biomes. Due to a set of cosmopolitan OTUs that are found in multiple samples and preferentially sequenced, only 2.1% of the global prokaryotic taxa are represented by sequenced genomes. Most of the biomes were occupied by a few predominant taxa with a high relative abundance and much higher genomesequenced proportions than numerous rare taxa. Conclusions: These results reveal the current situation of prokaryotic genome sequencing for earth biomes, provide a more reasonable and efficient exploration of prokaryotic genomes, and promote our understanding of microbial ecological functions. Keywords: Microbiome, Genome sequencing, Prokaryotic biome, Earth microbiome project, Predominant taxa

Background Prokaryotes are generally assumed to be the oldest existing form of life on earth and the primary engines of global biogeochemical processes; they are found in almost all ecosystems [1, 2]. Genome sequencing provides a blueprint for the evolutionary and functional diversities of prokaryotes and improves our understanding of how they interact with one another, their hosts, and their surroundings [3–5]. However, what is the cells or taxa proportion of genomesequenced bacteria or archaea on earth? This basic and seemingly simple question has never been answered. * Correspondence: [email protected]; [email protected] † Zheng Zhang and Jianing Wang contributed equally to this work. 1 State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China Full list of author information is available at the end of the article

Since the first bacterial genome was completely sequenced in 1995, more than 200,000 bacterial and archaeal complete or draft genomes have been uploaded to public databases as a result of the development of sequencing technology and the decrease in costs [6, 7]. Meanwhile, due to improvements in sequencing throughput and computational techniques, cultivation-independent recovery of genomes from me