Covariance-Model-Based RNA Gene Finding: Using Dynamic Programming versus Evolutionary Computing

This chapter compares the traditional dynamic programming RNA gene finding methodolgy with an alternative evolutionary computation approach. Both methods take a set of estimated covariance model parameters for a non-coding RNA family as given. The differe

PDF / 229,432 Bytes
26 Pages / 439.37 x 666.142 pts Page_size
93 Downloads / 232 Views

DOWNLOAD

REPORT

1 Introduction The initial focus of interpreting the output of sequencing projects such as the Human Genome Project [1] has been on annotating those portions of the genome sequences that code for proteins. More recently, it has been recognized that many signiﬁcant regulatory and catalytic functions can be attributed to RNA transcripts that are never translated into protein products [2]. These functional RNA (fRNA) or non-coding RNA (ncRNA) molecules have genes which require an entirely different approach to gene search than protein-coding genes. Protein-coding genes are usually detected by gene ﬁnding algorithms that generically search for putative gene locations and then later classify these genes into families. As an example, putative protein-coding genes could be identiﬁed using the GENESCAN program [3]. Classiﬁcation of these putative protein-coding genes could then be done using proﬁle hidden Markov models (HMMs) [4] to yield families of proteins (or protein domains) such as that in Pfam [5]. It is not necessary to scan entire genomes with an HMM since a small subset of the genome has already been identiﬁed by the gene ﬁnding algorithm as possible protein-coding gene locations. Unlike protein-coding genes, RNA genes are not associated with promoter regions and open reading frames. As a result, direct search for RNA genes using only S.F. Smith: Covariance-Model-Based RNA Gene Finding: Using Dynamic Programming versus Evolutionary Computing, Studies in Computational Intelligence (SCI) 94, 183–208 (2008) c Springer-Verlag Berlin Heidelberg 2008 www.springerlink.com

184

S.F. Smith

generic characteristics has not been successful [6]. Instead, a combined RNA gene ﬁnding and gene family classiﬁcation is undertaken using models of a gene family for database search over entire genomes. This has the disadvantage that RNA genes belonging to entirely novel families will not be found, but it is the only currently available method that works. It also means that the amount of genetic information that needs to be processed by the combined gene ﬁnder and classiﬁer is much larger than for protein classiﬁers. Functional RNA is made of single-stranded RNA with intramolecular base pairing. Whereas protein-coding RNA transcripts (mRNA) are primarily information carriers, functional RNA often depends on its three dimensional shape for the performance of its task. This results in conservation of three dimensional structure, but not necessarily primary sequence. The three dimensional shape of an RNA molecule is almost entirely determined by the intramolecular base pairing pattern of the molecule’s nucleotides. There are many examples of RNA families with very little primary sequence homology, but very well conserved secondary structure (see pp. 264–265 in [7]). It is very difﬁcult to ﬁnd RNA genes without taking conservation of secondary structure into account. Most homology search algorithms such as BLAST [8], Fasta [9], SmithWaterman [10], and proﬁle HMMs only model primary sequence and are therefore not well suited for RNA gene

Data Loading...

Covariance-Model-Based RNA Gene Finding: Using Dynamic Programming versus Evolutionary Computing

Recommend Documents

A General Framework for Computing the Nucleolus via Dynamic Programming

Introduction to Evolutionary Computing

Dynamic programming

Dynamic Programming

Gene Disruption Using Chemically Modified CRISPR-Cpf1 RNA

Gene Knockdown in Paracoccidioides brasiliensis Using Antisense RNA

Neuro-Dynamic Programming

Dynamic Programming: Discounted Problems

Dynamic Programming Comparison Method

Stochastic Lipschitz dynamic programming

Dynamic Programming in Clustering

Dynamic Programming: Inventory Control