A Platform for Peptidase Detection Based on Text Mining Techniques and Support Vector Machines

This paper presents a web platform for the detection of peptidases and motifs search from Merops database. The methodology for peptidases detection uses text mining techniques combined with Support Vector Machines (SVM). Preliminary results using two type

  • PDF / 575,764 Bytes
  • 11 Pages / 439.37 x 666.142 pts Page_size
  • 79 Downloads / 151 Views

DOWNLOAD

REPORT


A Platform for Peptidase Detection Based on Text Mining Techniques and Support Vector Machines Daniel Correia, Carlos Pereira, Paula Verı´ssimo, and Anto´nio Dourado

Abstract This paper presents a web platform for the detection of peptidases and motifs search from Merops database. The methodology for peptidases detection uses text mining techniques combined with Support Vector Machines (SVM). Preliminary results using two types of SVMs, the C-Support Vector Classification (C-SVC) and One-class SVM, show the feasibility of the methodology. Despite of the best results obtained with C-SVC the One-class SVM can be an alternative solution if only positive examples are available for training. Keywords Protein classification • Text mining • Support vector machines • Web platform

D. Correia (*) Department of Informatics Engineering and Systems, Coimbra Institute of Engineering, Portugal e-mail: [email protected] C. Pereira Department of Informatics Engineering and Systems, Coimbra Institute of Engineering, Portugal Department of Informatics Engineering, University of Coimbra, Portugal e-mail: [email protected] P. Verı´ssimo Department of Biochemistry and Center of Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal e-mail: [email protected] A. Dourado Department of Informatics Engineering, University of Coimbra, Portugal e-mail: [email protected] A. Madureira et al. (eds.), Computational Intelligence and Decision Making: Trends and 449 Applications, Intelligent Systems, Control and Automation: Science and Engineering 61, DOI 10.1007/978-94-007-4722-7_42, # Springer Science+Business Media Dordrecht 2013

450

42.1

D. Correia et al.

Introduction

The peptidases are a class of enzymes that catalyze chemical reactions, allowing the decomposition of protein substances into smaller molecules. They are involved in several processes crucial for the correct functioning of organisms. Its detection characterization is central to a better understanding of their role in biological systems. One of the main objectives of this work is to offer to the scientific community a web platform that allows the detection of peptidases. The methodology uses text mining techniques and two types of SVM algorithms, C-SVC and One-class SVM. The use of text-mining techniques for protein classification has been suggested in [1], however this work combines the methodology of n-gram counts with stateof-the art Support Vector Machines (SVMs). In order to investigate the detection ability of peptidases, a dataset formed by sequences of different classes from Merops has been used. The remainder of this paper is organized as follows. The next Section presents the methodologies used for the platform development, processes for extraction and the representation of the characteristics of the sequences. Section 42.3 briefly describes the two types of Support Vector Machines used. Section 42.4 presents the experiments and the results. The Sect. 42.5 presents the functionalities offered by the platform. The last Section presents the con