iDPGK: characterization and identification of lysine phosphoglycerylation sites based on sequence-based features

PDF / 2,099,226 Bytes
16 Pages / 595.276 x 790.866 pts Page_size
104 Downloads / 234 Views

RESEARCH ARTICLE

Open Access

iDPGK: characterization and identification of lysine phosphoglycerylation sites based on sequence‑based features Kai‑Yao Huang1,2†, Fang‑Yu Hung3†, Hui‑Ju Kao1, Hui‑Hsuan Lau2,3,4* and Shun‑Long Weng2,3,5* *Correspondence: [email protected]; [email protected] † Joint first authorship: Kai-Yao Huang and Fang-Yu Hung 2 Department of Medicine, Mackay Medical College, New Taipei City 252, Taiwan Full list of author information is available at the end of the article

Abstract Background: Protein phosphoglycerylation, the addition of a 1,3-bisphosphoglyceric acid (1,3-BPG) to a lysine residue of a protein and thus to form a 3-phosphoglyceryllysine, is a reversible and non-enzymatic post-translational modification (PTM) and plays a regulatory role in glucose metabolism and glycolytic process. As the number of experimentally verified phosphoglycerylated sites has increased significantly, statisti‑ cal or machine learning methods are imperative for investigating the characteristics of phosphoglycerylation sites. Currently, research into phosphoglycerylation is very limited, and only a few resources are available for the computational identification of phosphoglycerylation sites. Result: We present a bioinformatics investigation of phosphoglycerylation sites based on sequence-based features. The TwoSampleLogo analysis reveals that the regions sur‑ rounding the phosphoglycerylation sites contain a high relatively of positively charged amino acids, especially in the upstream flanking region. Additionally, the non-polar and aliphatic amino acids are more abundant surrounding phosphoglycerylated lysine following the results of PTM-Logo, which may play a functional role in discriminating between phosphoglycerylation and non-phosphoglycerylation sites. Many types of features were adopted to build the prediction model on the training dataset, including amino acid composition, amino acid pair composition, positional weighted matrix and position-specific scoring matrix. Further, to improve the predictive power, numerous top features ranked by F-score were considered as the final combination for classifica‑ tion, and thus the predictive models were trained using DT, RF and SVM classifiers. Evaluation by five-fold cross-validation showed that the selected features was most effective in discriminating between phosphoglycerylated and non-phosphoglycer‑ ylated sites. Conclusion: The SVM model trained with the selected sequence-based features per‑ formed well, with a sensitivity of 77.5%, a specificity of 73.6%, an accuracy of 74.9%, and a Matthews Correlation Coefficient value of 0.49. Furthermore, the model also consist‑ ently provides the effective performance in independent testing set, yielding sensitiv‑ ity of 75.7% and specificity of 64.9%. Finally, the model has been implemented as a web-based system, namely iDPGK, which is now freely available at http://mer.hc.mmh. org.tw/iDPGK/.

© The Author(s) 2020. Open Access This article is licensed under a Creative Commons Attribution 4.0 International L

Data Loading...

iDPGK: characterization and identification of lysine phosphoglycerylation sites based on sequence-based features

Recommend Documents

Script Identification Based on HSV Features

Features Identification and Selection

Dietary lysine requirement of fingerling Catla catla (Hamilton) based on growth, protein deposition, lysine retention ef

Prokaryote Characterization and Identification

Dynamical System Identification of Complex Nonlinear System Based on Phase Space Topological Features

Identification of Musical Instruments Using MFCC Features

Principles of Characterization and Identification of Prokaryotes

Unsupervised Author Identification and Characterization

Microbial Production and Applications of L-lysine

Research on Integrated Detection of SQL Injection Behavior Based on Text Features and Traffic Features

Person Re-identification Based on Fusing Appearance Features in Perceptual Color Space

Human Identification Based on Gait