Graphical Representation and Similarity Analysis of DNA Sequences Based on Trigonometric Functions

  • PDF / 1,867,128 Bytes
  • 21 Pages / 439.37 x 666.142 pts Page_size
  • 30 Downloads / 232 Views

DOWNLOAD

REPORT


Graphical Representation and Similarity Analysis of DNA Sequences Based on Trigonometric Functions Guo‑Sen Xie1,2   · Xiao‑Bo Jin3 · Chunlei Yang1,2 · Jiexin Pu1,2 · Zhongxi Mo4

Received: 15 September 2017 / Accepted: 6 April 2018 © Springer Science+Business Media B.V., part of Springer Nature 2018

Abstract  In this paper, we propose two four-base related 2D curves of DNA primary sequences (termed as F-B curves) and their corresponding single-base related 2D curves (termed as A-related, G-related, T-related and C-related curves). The constructions of these graphical curves are based on the assignments of individual base to four different sinusoidal (or tangent) functions; then by connecting all these points on these four sinusoidal (tangent) functions, we can get the F-B curves; similarly, by connecting the points on each of the four sinusoidal (tangent) functions, we get the single-base related 2D curves. The proposed 2D curves are all strictly non degenerate. Then, a 8-component characteristic vector is constructed to compare similarity among DNA sequences from different species based on a normalized geometrical centers of the proposed curves. As examples, we examine similarity among the coding sequences of the first exon of beta-globin gene from eleven species, similarity of cDNA sequences of beta-globin gene from eight species, and similarity of the whole mitochondrial genomes of 18 eutherian mammals. The experimental results well demonstrate the effectiveness of the proposed method. Keywords  Sinusoidal function · Tangent function · Similarity · DNA sequences

* Guo‑Sen Xie [email protected] 1

Information Engineering College, Henan University of Science and Technology, Luoyang 471023, China

2

Henan Joint International Research Laboratory of Image Processing and Intelligent Detection, Henan University of Science and Technology, Luoyang 471023, China

3

School of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, China

4

School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China



13



G.-S. Xie et al.

1 Introduction The main idea of graphical representation of DNA sequences is to transfer a DNA sequence consisting of four bases (A, G, T, C) to a space curve. The graphical representation method of DNA sequences can help us in visioning, recognizing, sorting among different DNA sequences, and graphical representation of DNA sequences is also a convenient tool to study the properties of DNA sequences, such as conducting similarity analysis based on graphical representations and calculating the number of four bases (A, G, T, C) in one DNA sequences. Graphical representation of DNA sequence can be classified into three categories, i.e., (1) 2D graphical representation; (2) 3D graphical representation; and (3) higher than 3D graphical representation. In this paper, we focus on the (1) 2D graphical representation of DNA sequence. Graphical representation of DNA sequences is first proposed by Hamori and Ruskin (1983). Gate (1986), Nandy (1994), Leong and