Models and algorithms for genome rearrangement with positional constraints

  • PDF / 1,657,384 Bytes
  • 10 Pages / 595.276 x 790.866 pts Page_size
  • 111 Downloads / 194 Views

DOWNLOAD

REPORT


Swenson et al. Algorithms Mol Biol (2016) 11:13 DOI 10.1186/s13015-016-0065-9

Open Access

RESEARCH

Models and algorithms for genome rearrangement with positional constraints Krister M. Swenson1,2*†, Pijus Simonaitis3† and Mathieu Blanchette4†

Abstract  Background:  Traditionally, the merit of a rearrangement scenario between two gene orders has been measured based on a parsimony criteria alone; two scenarios with the same number of rearrangements are considered equally good. In this paper, we acknowledge that each rearrangement has a certain likelihood of occurring based on biological constraints, e.g. physical proximity of the DNA segments implicated or repetitive sequences. Results:  We propose optimization problems with the objective of maximizing overall likelihood, by weighting the rearrangements. We study a binary weight function suitable to the representation of sets of genome positions that are most likely to have swapped adjacencies. We give a polynomial-time algorithm for the problem of finding a minimum weight double cut and join scenario among all minimum length scenarios. In the process we solve an optimization problem on colored noncrossing partitions, which is a generalization of the Maximum Independent Set problem on circle graphs. Conclusions:  We introduce a model for weighting genome rearrangements and show that under simple yet reasonable conditions, a fundamental distance can be computed in polynomial time. This is achieved by solving a generalization of the Maximum Independent Set problem on circle graphs. Several variants of the problem are also mentioned. Keywords:  Double cut and join (DCJ), Weighted genome rearrangement, Noncrossing partitions, Chromatin conformation, Hi-C Background A huge body of work exists on modeling the evolution of whole chromosomes  [1]. The main difference between such models is the set of rearrangements that they allow. The moves of interest are usually inversion, transposition, translocation, chromosome fission and fusion, deletion, insertion, and duplication. Almost all versions of the problem are NP-Hard if content modifying operations such at duplication, loss, and insertion are allowed  [2, 3]. Fortunately, a model that considers genomes with equal content (i.e., no duplications or insertions/deletions) is quite pertinent, particularly in eukaryotes, since syntenic blocks of genes can be assigned between genomes so that each block *Correspondence: [email protected] † Krister M. Swenson, Pijus Simonaitis and Mathieu Blanchette contributed equally to this work 2 Institut de Biologie Computationnelle (IBC), Montpellier, France Full list of author information is available at the end of the article

occurs exactly once in each genome. For two genomes with equal content, double cut and join (DCJ) has been the model of choice since it elegantly includes inversion, translocation, chromosome circularization and linearization, as well as chromosome fission and fusion [4, 5]. One of the most important problems in comparative genomics is the inference of ancestral ge