The Relation Between k -Circularity and Circularity of Codes
- PDF / 1,577,587 Bytes
- 34 Pages / 439.37 x 666.142 pts Page_size
- 107 Downloads / 200 Views
The Relation Between k-Circularity and Circularity of Codes Elena Fimmel1 · Christian J. Michel2 · François Pirot2,3 · Jean-Sébastien Sereni2 · Martin Starman1,2 · Lutz Strüngmann1 Received: 4 February 2020 / Accepted: 24 June 2020 / Published online: 4 August 2020 © Society for Mathematical Biology 2020
Abstract A code X is k-circular if any concatenation of at most k words from X , when read on a circle, admits exactly one partition into words from X . It is circular if it is k-circular for every integer k. While it is not a priori clear from the definition, there exists, for every pair (n, ), an integer k such that every k-circular -letter code over an alphabet of cardinality n is circular, and we determine the least such integer k for all values of n and . The k-circular codes may represent an important evolutionary step between the circular codes, such as the comma-free codes, and the genetic code. Keywords Circular code · k-circular code · Genetic code · Code evolution
B
Christian J. Michel [email protected] Elena Fimmel [email protected] François Pirot [email protected] Jean-Sébastien Sereni [email protected] Martin Starman [email protected] Lutz Strüngmann [email protected]
1
Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany
2
Theoretical Bioinformatics, ICube, C.N.R.S., University of Strasbourg, 300 Boulevard Sébastien Brant, 67400 Illkirch, France
3
LORIA (Orpailleur), C.N.R.S., University of Lorraine, INRIA, Campus scientifique, 54506 Vandœuvre-lès-Nancy Cedex, France
123
105
Page 2 of 34
E. Fimmel et al.
1 Introduction The discovery of the DNA structure by Watson and Crick (1953) spurred a new branch of mathematical biology which overlaps the theory of block codes, i.e. codes consisting of words of a fixed length over some finite alphabet. A relevant concept in this biomathematical context is that of comma-freeness, meaning that code words can be separated without using extra symbols (“commas”). This concept was later weakened to that of a circular code, meaning that the reading frame can always be retrieved in any word written on a circle. The DNA structure is a sequence of nucleotides on the 4-letter alphabet {A, C, G, T }, where A stands for adenine, C for cytosine, G for guanine and T for thymine, organized in an antiparallel and complementary double helix. A (protein coding) gene is a DNA sequence which is read during the translation process by words of three letters also called trinucleotides or codons. The genetic code is a map between the 64 possible codons and the 20 amino acids constituting the proteins and the three stop codons. Soon after this discovery, scientists believed that the redundancy in the codon amino acid assignment must be some kind of code used by nature for error detection. Crick et al. (2003) proposed that in such a code, no codon can be obtained by concatenating a nonempty suffix and a nonempty prefix of codons in the code. It follows that a frame
Data Loading...