Predicting torsion angles in amino acid protein sequences based on a bayesian classification procedure on markov chains

  • PDF / 217,999 Bytes
  • 7 Pages / 595.276 x 793.701 pts Page_size
  • 46 Downloads / 143 Views

DOWNLOAD

REPORT


PREDICTING TORSION ANGLES IN AMINO ACID PROTEIN SEQUENCES BASED ON A BAYESIAN CLASSIFICATION PROCEDURE ON MARKOV CHAINS I. V. Sergienko,a† B. A. Biletskyy,a‡ and A. M. Gupala††

UDC 519.68

Abstract. Torsion angles formed by C a atoms of four neighboring residues are predicted using a Bayesian classification procedure on nonstationary Markov chains. The predicted sequence of torsion angles is used to construct a three-dimensional protein structure on Z 3 lattice. Keywords: torsion angle, Bayesian procedure, Markov chain, protein secondary structure, transition probabilities.

Introduction. Joint efforts of scientists all over the world have made it possible to decode the genomes of human being, chimpanzee, hen, Tetraodon fish, and some other animals, some plants, and over a thousand of bacteria. Given the nucleotide sequence of a gene, it is possible to determine uniquely the amino acid sequence of the protein since each of 20 amino acids is coded by a definite nucleotide triplet (codon). Once the sequence of amino acids is translated from an RNA molecule, protein starts folding into a spatial configuration. It is the spatial configuration of protein that determines its functionality since proteins in living organisms interact as three-dimensional objects in space. Therefore, the “sequence–structure–functionality” doctrine is adhered to in studying proteins and their functions [1]. This means that the functionality of a protein is specified by its spatial structure, and the spatial configuration is defined by its amino acid sequence. There are four levels in protein structure: · primary — a linear sequence of amino acid residues in a protein molecule; · secondary — formation of local regular structures, alpha helices and beta sheets, on the linear sequence; · tertiary — spatial arrangement of the elements of the secondary structure (alpha helices and beta sheets); · quaternary — formation of a protein complex of single proteins. The protein structure at each level renders a decisive influence on how the structure is formed at the next level; i.e., the primary structure determines the secondary one, the secondary structure determines the tertiary one, etc. (Fig. 1). The primary protein structure, i.e., its amino acid sequence, can easily be determined experimentally. Determining the structures of higher order involves severe difficulties since this requires using X-ray diffraction analysis and NMR spectroscopy, which are expensive methods. The high cost of the experimental determination of protein structure promotes the development of mathematical methods used to predict it. Finding the spatial structure of a protein from its amino acid sequence is one of the major unsolved problems in computational biology and bioinformatics. In the papers [2, 3], a Bayesian classification procedure on Markov chains was applied to determine protein secondary structure. In the present paper, a similar procedure is applied to predict tertiary protein structure. Problem Statement. The input of the pattern recognition problem is a prim