Structure of the space of taboo-free sequences

  • PDF / 744,692 Bytes
  • 29 Pages / 439.37 x 666.142 pts Page_size
  • 34 Downloads / 162 Views

DOWNLOAD

REPORT


Mathematical Biology

Structure of the space of taboo-free sequences Cassius Manuel1

· Arndt von Haeseler1,2

Received: 29 October 2019 / Revised: 19 August 2020 / Published online: 17 September 2020 © The Author(s) 2020

Abstract Models of sequence evolution typically assume that all sequences are possible. However, restriction enzymes that cut DNA at specific recognition sites provide an example where carrying a recognition site can be lethal. Motivated by this observation, we studied the set of strings over a finite alphabet with taboos, that is, with prohibited substrings. The taboo-set is referred to as T and any allowed string as a taboo-free string. We consider the so-called Hamming graph Γn (T), whose vertices are taboo-free strings of length n and whose edges connect two taboo-free strings if their Hamming distance equals one. Any (random) walk on this graph describes the evolution of a DNA sequence that avoids taboos. We describe the construction of the vertex set of Γn (T). Then we state conditions under which Γn (T) and its suffix subgraphs are connected. Moreover, we provide an algorithm that determines if all these graphs are connected for an arbitrary T. As an application of the algorithm, we show that about 87% of bacteria listed in REBASE have a taboo-set that induces connected taboo-free Hamming graphs, because they have less than four type II restriction enzymes. On the other hand, four properly chosen taboos are enough to disconnect one suffix subgraph, and consequently connectivity of taboo-free Hamming graphs could change depending on the composition of restriction sites. Keywords Bacteriophage DNA evolution · Endonuclease-dependent evolution · Restriction-enzyme dependent evolution · Restriction–modification system · Hamming graph with taboos · Connectivity of Hamming graphs Mathematics Subject Classification 05C40 · 92D15

B

Cassius Manuel [email protected] Arndt von Haeseler [email protected]

1

Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna, Medical University of Vienna, Dr. Bohr Gasse 9, 1030 Vienna, Austria

2

Faculty of Computer Science, University of Vienna, Währinger Str. 29, 1090 Vienna, Austria

123

1030

C. Manuel, A. von Haeseler

1 Introduction In bacteria, restriction enzymes cleave foreign DNA to stop its propagation. To do so, a double-stranded cut is induced by a so-called recognition site, a DNA sequence of length 4–8 base pairs (Alberts et al. 2004). As part of their restriction–modification (R–M) system, bacteria can escape the lethal effect of their own restriction enzymes by modifying recognition sites in their own DNA (Kommireddy and Nagaraja 2013). Nevertheless, Gelfand and Koonin (1997) and Rocha et al. (2001) found a significant avoidance of recognition sites in bacterial DNA, and Rusinov et al. (2015) showed that this avoidance was characteristic of type II R–M systems. Also in bacteriophages, the avoidance of the recognition sites is evolutionary advantageous (Rocha et al. 2001), mainly for non-tempera