Machine Learning Configurations for Enhanced Human Protein Function Prediction Accuracy

Molecular class prediction of a protein is highly relevant for conducting research in domains of disease-detection and drug discovery process. Numerous approaches are incorporated to increase the accuracy of Human protein Function (HPF) prediction task, b

  • PDF / 363,380 Bytes
  • 11 Pages / 439.37 x 666.142 pts Page_size
  • 55 Downloads / 251 Views

DOWNLOAD

REPORT


Abstract Molecular class prediction of a protein is highly relevant for conducting research in domains of disease-detection and drug discovery process. Numerous approaches are incorporated to increase the accuracy of Human protein Function (HPF) prediction task, but it is highly challenging due to wide and versatile nature of this domain. This research is focused on sequence derived attributes/features (SDF) approach for HPF prediction and critically analyzed with the WEKA data analysis tool. New SDFs were identified and included in the training dataset from the Human protein reference database, enhanced as in number of sequences and the related features for deriving the relation with various protein classes. A range of Machine Learning approaches were analyzed for prediction effectiveness and a comprehensive comparison is carried out to achieve higher classification accuracy. The Machine Learning approach is also analyzed for its limitation on application of broad spectrum data domain and remedies for the limitation were also explored by changing the configuration of data sets and prediction classes.















Keywords Bagging Bayes Net C5 Decision tree HPF IBK J48 Logistic approach PART Random forest SDF Weka









1 Introduction Protein classification is a vast domain with enormous amount of data available for research and analysis yet the knowledge about its correct perception is very limited. On the other hand Machine learning (ML) provides promising answers to not-so-clearly defined areas of research. Thus, it’s a powerful tool to explore the possibilities of the enhancement of the current understanding of protein. A. Singh  S. Sharma (&)  G. Singh  R. Singh Guru Nanak Dev University, Amritsar, India e-mail: [email protected] A. Singh e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2019 A. K. Luhach et al. (eds.), Smart Computational Strategies: Theoretical and Practical Aspects, https://doi.org/10.1007/978-981-13-6295-8_4

37

38

A. Singh et al.

Decision tree [1, 2] based prediction approach of machine learning is very clear and reliable for protein classification. Being a white-box approach it clearly illustrates the sequence of computations involved at each and every stage. This plus point enables its usage by computational experts even without much knowledge of the concerned domain. Similarly, a domain expert is empowered for examining the toute followed by an expert of computation. So the gap between technical knowledge and domain expertise. Nodes and edges indicates various utilities at the different stages of computations in a Decision tree [3]. A decision tree neatly depicts the results required or outputs of various possibilities of outcome. It clearly defines the problem structure and its interpretations in a hierarchical way which is much easier to comprehend. As the model has a unique ability of considering different initial parameters and reaching a goal [4, 5]. However, recent advancements suggests that the prediction of Protein-Function is a domain