PLoc-Euk: An Ensemble Classifier for Prediction of Eukaryotic Protein Sub-cellular Localization

Protein Sub-Cellular Localization is very important information as they play a crucial role in their functions. Thus, prediction of protein Sub-Cellular Localization has become very promising and challenging problem in the field of Bioinformatics. Recentl

  • PDF / 173,496 Bytes
  • 9 Pages / 439.37 x 666.142 pts Page_size
  • 19 Downloads / 140 Views

DOWNLOAD

REPORT


Abstract Protein Sub-Cellular Localization is very important information as they play a crucial role in their functions. Thus, prediction of protein Sub-Cellular Localization has become very promising and challenging problem in the field of Bioinformatics. Recently, a number of computational methods based on amino acid compositions or on the functional domain or sorting signal. But, they lack of contextual information of the protein sequence. In this paper, an ensemble classifier, PLoc-Euk is proposed to predict sub-cellular location for the eukaryotic proteins which uses multiple physico-chemical properties of amino acid along with their composition. PLoC-Euk aims to predict protein Sub-Cellular Localization in eukaryotes across five different locations, namely, Cell Wall, Cytoplasm, Extracellular, Mitochondrion, and Nucleus. The classifier is applied to the dataset extracted from http://www.bioinfo.tsinghua.edu.cn/∼guotao/data/ and achieves 73. 37% overall accuracy. Keywords Sub-cellular localization acid Ensemble classifier





Physico-chemical properties of amino

R. Mitra Rate Integration Software Technologies Pvt. Ltd., 213 A, A.J.C. Bose Road, Kolkata 20, India e-mail: [email protected] P. Chatterjee (✉) Department of Computer Science & Engineering, Netaji Subhash Engineering College, Garia 152, Kolkata, India e-mail: [email protected] S. Basu (✉) ⋅ M. Kundu ⋅ M. Nasipuri Department of Computer Science & Engineering, Jadavpur University, Kolkata 700032, India e-mail: [email protected] M. Kundu e-mail: [email protected] M. Nasipuri e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2017 S.C. Satapathy et al. (eds.), Proceedings of the 5th International Conference on Frontiers in Intelligent Computing: Theory and Applications, Advances in Intelligent Systems and Computing 516, DOI 10.1007/978-981-10-3156-4_12

119

120

R. Mitra et al.

1 Introduction With the deluge of gene products in the post genomic age, the gap between the newly found protein sequences and their cellular location is growing larger. To use these newly found protein sequences for drug discovery it is desired to develop an effective method to bridge such a gap. In real life, it is found that proteins may simultaneously exist at or move between two or more different Sub-Cellular locations. Thus, localization of proteins is very challenging problem in Bioinformatics. The annotations of protein Sub-Cellular localization can be detected by various biochemical experiments such as cell fraction, electron microscopy and fluorescent microscopy. These accurate experimental approaches are time consuming and expensive which necessitates the computational techniques to predict protein Sub-Cellular Localization which will be useful for protein function prediction. A number of in-silico Sub-Cellular Localization methods have been proposed. Most of the prediction methods can be classified into various categories which are based on the recognition of protein N-terminal sorting signals, amino acid composition, functi