A Genetic Programming Method for the Identification of Signal Peptides and Prediction of Their Cleavage Sites

  • PDF / 670,911 Bytes
  • 8 Pages / 600 x 792 pts Page_size
  • 118 Downloads / 227 Views

DOWNLOAD

REPORT


A Genetic Programming Method for the Identification of Signal Peptides and Prediction of Their Cleavage Sites David Lennartsson Saida Medical AB, Stena Center 1A, SE-412 92 G¨oteborg, Sweden Email: [email protected]

Peter Nordin Department of Physical Resource Theory, Chalmers University of Technology, SE-412 96 G¨oteborg, Sweden Email: [email protected] Received 28 February 2003; Revised 31 July 2003 A novel approach to signal peptide identification is presented. We use an evolutionary algorithm for automatic evolution of classification programs, so-called programmatic motifs. The variant of evolutionary algorithm used is called genetic programming where a population of solution candidates in the form of full computer programs is evolved, based on training examples consisting of signal peptide sequences. The method is compared with a previous work using artificial neural network (ANN) approaches. Some advantages compared to ANNs are noted. The programmatic motif can perform computational tasks beyond that of feedforward neural networks and has also other advantages such as readability. The best motif evolved was analyzed and shown to detect the h-region of the signal peptide. A powerful parallel computer cluster was used for the experiment. Keywords and phrases: signal peptides, genetic programming, bioinformatics, programmatic motif, artificial neural networks, cleavage site.

1.

INTRODUCTION

The huge and growing amount of unanalyzed data present in genetic research creates a demand for automatic methods for classification of proteins and protein properties. Automatic mechanical means for property screening of interesting proteins would accelerate the process of finding new drug candidates. Classification rules for the processing of amino acid sequences can be obtained either by human design or by a mechanical process, the latter often through the use of machinelearning algorithms. A signal peptide is a short region of amino acid residues situated at the N-terminal part of some peptide chains. Commonly, signal peptides are referred to as the address tags within the cell since they control the transport of proteins through the secretory pathway, the mechanism that moves proteins through cell membranes. These proteins are produced by ribosomes in the cytoplasm but the produced peptide does not fold to become a protein at this stage. Instead, the first part of the peptide, the signal peptide, attaches itself to a translocon in the membrane. This binding opens a channel and the peptide starts to transport itself through the translocon channel. After transportation through the mem-

brane, the signal peptide cleaves from the protein’s peptide and the channel is closed. The protein’s peptide is now free and can fold itself to become an active, or mature, protein. The existence of a signaling mechanism in the cell was first postulated by G¨unther Blobel in 1971. After a series of experiments, he came to the correct conclusion that the signal, or address tag, was coded with amino acids as part o