Intrinsic laws of k-mer spectra of genome sequences and evolution mechanism of genomes
- PDF / 3,637,882 Bytes
- 15 Pages / 595.276 x 790.866 pts Page_size
- 18 Downloads / 182 Views
RESEARCH ARTICLE
Open Access
Intrinsic laws of k‑mer spectra of genome sequences and evolution mechanism of genomes Zhenhua Yang1,2†, Hong Li1*† , Yun Jia3, Yan Zheng4, Hu Meng5, Tonglaga Bao1, Xiaolong Li1 and Liaofu Luo1
Abstract Background: K-mer spectra of DNA sequences contain important information about sequence composition and sequence evolution. We want to reveal the evolution rules of genome sequences by studying the k-mer spectra of genome sequences. Results: The intrinsic laws of k-mer spectra of 920 genome sequences from primate to prokaryote were analyzed. We found that there are two types of evolution selection modes in genome sequences, named as CG Independent Selection and TA Independent Selection. There is a mutual inhibition relationship between CG and TA independent selections. We found that the intensity of CG and TA independent selections correlates closely with genome evolution and G + C content of genome sequences. The living habits of species are related closely to the independent selection modes adopted by species genomes. Consequently, we proposed an evolution mechanism of genomes in which the genome evolution is determined by the intensities of the CG and TA independent selections and the mutual inhibition relationship. Besides, by the evolution mechanism of genomes, we speculated the evolution modes of prokaryotes in mild and extreme environments in the anaerobic age and the evolving process of prokaryotes from anaerobic to aerobic environment on earth as well as the originations of different eukaryotes. Conclusion: We found that there are two independent selection modes in genome sequences. The evolution of genome sequence is determined by the two independent selection modes and the mutual inhibition relationship between them. Keywords: Genome sequence, K-mer spectra, Independent selection law, Evolution mechanism of genomes, Evolution modes of prokaryotes Background The frequency of k-mers (k = 1, 2, 3…) in nucleotide sequences is nonrandom. The nonrandom characteristic was widely used to predict and identify functional regions, such as promoter regions [1–3, 5], enhancers [4], CpG island sequences [5, 6], conservative non-coding sequences [7] and transcriptional start sites [8]. The *Correspondence: [email protected] † Zhenhua Yang and Hong Li contributed equally to this work 1 Laboratory of Theoretical Biophysics, School of Physical Science & Technology, Inner Mongolia University, Hohhot 010021, China Full list of author information is available at the end of the article
motif characters of k-mers have been used to analyze the interaction signals between nucleotide elements and proteins, such as recognizing the hypersensitive binding site of enzymes [9], probe design [10], drug design [11] and nucleosome positioning [12, 13]. The usage difference of k-mers has been used to do the sequence alignment [14] in chromosome assemble [15, 16], genome dictionary construction [17, 18] and metagenomic classification [19, 20], etc. Although advances have been made in information mining
Data Loading...