Edit Distance Under Block Operations

  • PDF / 204,670 Bytes
  • 3 Pages / 547.087 x 737.008 pts Page_size
  • 54 Downloads / 189 Views

DOWNLOAD

REPORT


E

E

Block edit distance

fined as the minimum number of single character edits, block moves, as well as block copies and block uncopies to transform one of the strings into the other. Copying of a block s[j, k] to position h transforms S = s1 s2 : : : s n into S 0 = s1 : : : s j s j+1 : : : s k : : : s h1 s j : : : s k s h : : : s n . A block uncopy is the inverse of a block copy: it deletes a block s[j, k] provided there exists s[ j0 ; k 0 ] = s[ j; k] which does not overlap with s[j, k] and transforms S into S 0 = s1 : : : s j1 s k+1 : : : s n . Throughout this discussion all edit operations have unit cost and they may overlap; i. e. a character can be edited on multiple times.

Problem Definition

Key Results

Given two strings S = s1 s2 : : : s n and R = r1 r2 : : : r m (wlog let n  m) over an alphabet  = f1 ; 2 ; : : : ` g, the standard edit distance between S and R, denoted ED(S, R) is the minimum number of single character edits, specifically insertions, deletions and replacements, to transform S into R (equivalently R into S). If the input strings S and R are permutations of the alphabet  (so that jSj = jRj = jj) then an analogous permutation edit distance between S and R, denoted PED(S, R) can be defined as the minimum number of single character moves, to transform S into R (or vice versa). A generalization of the standard edit distance is edit distance with moves, which, for input strings S and R is denoted EDM(S, R), and is defined as the minimum number of character edits and substring (block) moves to transform one of the strings into the other. A move of block s[j, k] to position h transforms S = s1 s2 : : : s n into S 0 = s1 : : : s j1 s k+1 s k+2 : : : s h1 s j : : : s k s h : : : s n [4]. If the input strings S and R are permutations of the alphabet  (so that jSj = jRj = jj) then EDM(S, R) is also called as the transposition distance and is denoted TED(S, R) [1]. Perhaps the most general form of the standard edit distance that involves edit operations on blocks/substrings is the block edit distance, denoted BED(S, R). It is de-

There are exact and approximate solutions to computing the edit distances described above with varying performance guarantees. As can be expected, the best available running times as well as the approximation factors for computing these edit distances vary considerably with the edit operations allowed.

Edit Distance Under Block Operations 2000; Cormode, Paterson, Sahinalp, Vishkin 2000; Muthukrishnan, Sahinalp S. CENK SAHINALP Lab for Computational Biology, Simon Fraser University, Burnaby, BC, USA Keywords and Synonyms

Exact Computation of the Standard and Permutation Edit Distance The fastest algorithms for exactly computing the standard edit distance have been available for more than 25 years. Theorem 1 (Levenshtein [9]) The standard edit distance ED(S, R) can be computed exactly in time O(n  m) via dynamic programming. Theorem 2 (Masek-Paterson [11]) The standard edit distance ED(S, R) can be computed exactly in time O(n + n  m/log2j j n) via the “four-Rus