Analysis of codon usage patterns in citrus based on coding sequence data

  • PDF / 2,212,686 Bytes
  • 10 Pages / 595 x 791 pts Page_size
  • 102 Downloads / 211 Views

DOWNLOAD

REPORT


RESEARCH

Open Access

Analysis of codon usage patterns in citrus based on coding sequence data Zenan Shen1,2† , Zhimeng Gan3† , Fa Zhang1,2 , Xinyao Yi4 , Jinzhi Zhang3 and Xiaohua Wan1,2* From 15th International Symposium on Bioinformatics Research and Applications (ISBRA ’19) Barcelona, Spain. 3–6 June 2019

Abstract Background: Codon usage is an important determinant of gene expression levels that can help us understand codon biology, evolution and mRNA translation of species. The majority of previous codon usage studies have focused on single species analysis, although few studies have focused on the species within the same genus. In this study, we proposed a multispecies codon usage analysis workflow to reveal the genetic features and correlation in citrus. Results: Our codon usage analysis workflow was based on the GC content, GC plot, and relative synonymous codon usage value of each codon in 8 citrus species. This approach allows for the comparison of codon usage bias of different citrus species. Next, we performed cluster analysis and obtained an overview of the relationship in citrus. However, traditional methods cannot conduct quantitative analysis of the correlation. To further estimate the correlation among the citrus species, we used the frequency profile to construct feature vectors of each species. The Pearson correlation coefficient was used to quantitatively analyze the distance among the citrus species. This result was consistent with the cluster analysis. Conclusions: Our findings showed that the citrus species are conserved at the genetic level and demonstrated the existing genetic evolutionary relationship in citrus. This work provides new insights into codon biology and the evolution of citrus and other plant species. Keywords: Citrus, Codon usage, GC biology, Evolution, Correlation

Background The genetic code is degenerate. There are 64 different codons, including 61 codons encoding for amino acids and 3 stop codons, but only 20 translated amino acids. As a result of the degeneracy of the genetic code, many amino acids are encoded by two-to-six synonymous codons, termed condon usage bias. The genetic codes of different organisms are often biased towards the use of one of several codons. The codons that encode the same amino acid

over the others are called synonymous codons [1]. These differences among the usage of the synonymous codons have been the important factor for the evolution of proteome diversity, and preferences for synonymous codons exists widely within the genomes due to mutation, natural selection, and random drift [2–4]. Thus, a comprehensive understanding of the biases in codon usage can help us explore the evolution of those proteins that have structural differences conserved at the sequence level [5–8].

*Correspondence: [email protected] † Zenan Shen and Zhimeng Gan contributed equally to this work. High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, 100190 Beijing, China 2 University of Chinese Academy of Science