On the family-free DCJ distance and similarity

PDF / 625,106 Bytes
10 Pages / 595 x 794 pts Page_size
5 Downloads / 285 Views

RESEARCH

Open Access

On the family-free DCJ distance and similarity Fábio V Martinez1,2 , Pedro Feijão2 , Marília DV Braga3 and Jens Stoye2*

Abstract Structural variation in genomes can be revealed by many (dis)similarity measures. Rearrangement operations, such as the so called double-cut-and-join (DCJ), are large-scale mutations that can create complex changes and produce such variations in genomes. A basic task in comparative genomics is to find the rearrangement distance between two given genomes, i.e., the minimum number of rearragement operations that transform one given genome into another one. In a family-based setting, genes are grouped into gene families and efficient algorithms have already been presented to compute the DCJ distance between two given genomes. In this work we propose the problem of computing the DCJ distance of two given genomes without prior gene family assignment, directly using the pairwise similarities between genes. We prove that this new family-free DCJ distance problem is APX-hard and provide an integer linear program to its solution. We also study a family-free DCJ similarity and prove that its computation is NP-hard. Keywords: Genome rearrangement, DCJ, Family-free genome comparison

Background Genomes are subject to mutations or rearrangements in the course of evolution. Typical large-scale rearrangements change the number of chromosomes and/or the positions and orientations of genes. Examples of such rearrangements are inversions, translocations, fusions and fissions. A classical problem in comparative genomics is to compute the rearrangement distance, that is, the minimum number of rearrangements required to transform a given genome into another given genome [1]. In order to study this problem, one usually adopts a high-level view of genomes, in which only “relevant” fragments of the DNA (e.g., genes) are taken into consideration. Furthermore, a pre-processing of the data is required, so that we can compare the content of the genomes. One popular method, adopted for more than 20 years, is to group the genes in both genomes into gene families, so that two genes in the same family are said to be equivalent. This setting is said to be family-based. Without gene duplications, that is, with the additional restriction that each family occurs exactly once in each genome, many polynomial models have been proposed to compute the genomic distance [2-5]. However, when gene *Correspondence: [email protected] 2 Technische Fakultät and CeBiTec, Universität Bielefeld, Universitätsstr. 25, 33615 Bielefeld, Germany Full list of author information is available at the end of the article

duplications are allowed, the problem is more intrincate and all approaches proposed so far are NP-hard, see for instance [6-10]. It is not always possible to classify each gene unambiguously into a single gene family. Due to this fact, an alternative to the family-based setting was proposed recently and consists in studying the rearrangement distance without prior family assignment. Instead of fami

Data Loading...

On the family-free DCJ distance and similarity

Recommend Documents

Distance and Similarity Measures

Distance and Similarity Measures

Quantum Algorithms for Similarity Measurement Based on Euclidean Distance

An overview of distance and similarity functions for structured data

Efficient processing of graph similarity queries with edit distance constraints

Distance and similarity measures for multiple-attribute decision making with dual hesitant fuzzy sets

On the Similarity Between Neural Network and Evolutionary Algorithm

The Similarity Between Dissimilarities

The Similarity View

Structural similarity and similarity in thermal properties of the polymorphs: melting and crystallization from the melt

Similarity ranking technique exploiting the structure of similarity relationships

On the Average Taxicab Distance Function and Its Applications