Map-invariant spectral analysis for the identification of DNA periodicities

  • PDF / 569,645 Bytes
  • 21 Pages / 595 x 794 pts Page_size
  • 32 Downloads / 170 Views

DOWNLOAD

REPORT


RESEARCH

Open Access

Map-invariant spectral analysis for the identification of DNA periodicities Ahmad Rushdi1* , Jamal Tuqan2 and Thomas Strohmer3

Abstract Many signal processing based methods for finding hidden periodicities in DNA sequences have primarily focused on assigning numerical values to the symbolic DNA sequence and then applying spectral analysis tools such as the short-time discrete Fourier transform (ST-DFT) to locate these repeats. The key results pertaining to this approach are however obtained using a very specific symbolic to numerical map, namely the so-called Voss representation. An important research problem is to therefore quantify the sensitivity of these results to the choice of the symbolic to numerical map. In this article, a novel algebraic approach to the periodicity detection problem is presented and provides a natural framework for studying the role of the symbolic to numerical map in finding these repeats. More specifically, we derive a new matrix-based expression of the DNA spectrum that comprises most of the widely used mappings in the literature as special cases, shows that the DNA spectrum is in fact invariable under all these mappings, and generates a necessary and sufficient condition for the invariance of the DNA spectrum to the symbolic to numerical map. Furthermore, the new algebraic framework decomposes the periodicity detection problem into several fundamental building blocks that are totally independent of each other. Sophisticated digital filters and/or alternate fast data transforms such as the discrete cosine and sine transforms can therefore be always incorporated in the periodicity detection scheme regardless of the choice of the symbolic to numerical map. Although the newly proposed framework is matrix based, identification of these periodicities can be achieved at a low computational cost. 1 Introduction Many researchers have noted that the occurrence of repetitive structures in a DNA sequence is symptomatic of a biological phenomena. Specific applications of this observation include identification of diseases [1], DNA forensics [2], and detection of pathogen exposure [3]. Some of these structures are simple repetition of short DNA segments such as exons [4], tandem repeats [5], dispersed repeats [6], and unstable triplet repeats in the noncoding regions [7] while other forms more elaborate patterns such as palindromes [8] and the period-3 component [9-13], a strong periodic characteristic found primarily in genes and pseudogenes [14]. Methods that detect these DNA periodicities are either probabilistic or deterministic. Most of the deterministic techniques rely

*Correspondence: [email protected] 1 Department of Electrical and Computer Engineering at the University of California, Davis, CA 95616, USA, and is now with Cisco Systems, Inc., San Jose CA 95134, USA Full list of author information is available at the end of the article

on spectral analysis of the DNA sequence using the shorttime discrete Fourier transform (ST-DFT) [15-17]. The main idea is as follows: given a