Predicting and clustering plant CLE genes with a new method developed specifically for short amino acid sequences

  • PDF / 3,071,658 Bytes
  • 17 Pages / 595.276 x 790.866 pts Page_size
  • 37 Downloads / 152 Views

DOWNLOAD

REPORT


METHODOLOGY ARTICLE

Open Access

Predicting and clustering plant CLE genes with a new method developed specifically for short amino acid sequences Zhe Zhang1,2, Lei Liu1,2, Melis Kucukoglu3,4, Dongdong Tian1,2, Robert M. Larkin1,2, Xueping Shi1,2* and Bo Zheng1,2*

Abstract Background: The CLV3/ESR-RELATED (CLE) gene family encodes small secreted peptides (SSPs) and plays vital roles in plant growth and development by promoting cell-to-cell communication. The prediction and classification of CLE genes is challenging because of their low sequence similarity. Results: We developed a machine learning-aided method for predicting CLE genes by using a CLE motif-specific residual score matrix and a novel clustering method based on the Euclidean distance of 12 amino acid residues from the CLE motif in a site-weight dependent manner. In total, 2156 CLE candidates—including 627 novel candidates—were predicted from 69 plant species. The results from our CLE motif-based clustering are consistent with previous reports using the entire pre-propeptide. Characterization of CLE candidates provided systematic statistics on protein lengths, signal peptides, relative motif positions, amino acid compositions of different parts of the CLE precursor proteins, and decisive factors of CLE prediction. The approach taken here provides information on the evolution of the CLE gene family and provides evidence that the CLE and IDA/IDL genes share a common ancestor. Conclusions: Our new approach is applicable to SSPs or other proteins with short conserved domains and hence, provides a useful tool for gene prediction, classification and evolutionary analysis. Keywords: Peptide hormone, CLE, Machine learning, Euclidean distance, Gene prediction, Gene clustering, Evolution

Background Small secreted peptides (SSPs) play vital roles in cell-tocell communication during plant growth and development [1–4]. The most well understood plant SSPs are encoded by the CLAVATA3 (CLV3)/EMBRYO SURROUNDING REGION (ESR)-RELATED (CLE) gene family [5, 6]. CLE peptides have been widely identified in bryophytes, pteridophytes, gymnosperms and angiosperms [7]. A typical CLE protein contains an N-terminal signal peptide, a non* Correspondence: [email protected]; [email protected] 1 Key Laboratory of Horticultural Plant Biology of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China Full list of author information is available at the end of the article

conserved variable region in the middle, a C-terminal conserved motif (CLE motif) and in some instances, a short C-terminal tail downstream of the CLE motif. CLE motifs are usually composed of 12 to 13 amino acid residues. Exogenous peptides containing the CLE motif can mimic the phenotypes of transgenic plants that overexpress CLE genes [8–10]. The conserved CLE domains contain hydroxyproline and arabinosylated hydroxyproline residues [11–13]. Interestingly, the influence of these posttranslational modifications varies in different species. For instance, post-translational modificatio