A Formal Investigation of Diff3

The diff3 algorithm is widely considered the gold standard for merging uncoordinated changes to list-structured data such as text files. Surprisingly, its fundamental properties have never been studied in depth.

PDF / 436,609 Bytes
12 Pages / 430 x 660 pts Page_size
40 Downloads / 150 Views

DOWNLOAD

REPORT

University of Pennsylvania 2 Yahoo

Abstract. The diff3 algorithm is widely considered the gold standard for merging uncoordinated changes to list-structured data such as text ﬁles. Surprisingly, its fundamental properties have never been studied in depth. We oﬀer a simple, abstract presentation of the diff3 algorithm and investigate its behavior. Despite abundant anecdotal evidence that people ﬁnd diff3’s behavior intuitive and predictable in practice, characterizing its good properties turns out to be rather delicate: a number of seemingly natural intuitions are incorrect in general. Our main result is a careful analysis of the intuition that edits to “well-separated” regions of the same document are guaranteed never to conﬂict.

1

Introduction

Users often want to edit a local copy of a replicated data structure, postponing the moment when their changes become visible to others until sometime later— when a set of changes has been ﬁnished and tested, when an oﬄine laptop is reconnected to the network, etc. In general, when multiple users can edit at the same time, this reconciliation process requires a tool—a synchronizer — that can propagate non-conﬂicting changes between diﬀerent copies of the data, while recognizing and ﬂagging conﬂicts. Source code management systems, longdistance collaborative editing environments, and ﬁle synchronizers are examples. Operation-based synchronizers work by keeping track of the complete sequences of operations that have been applied to each replica and, during reconciliation, attempting to synthesize a single uniﬁed view of the data structure’s edit history. By contrast, a state-based synchronizer sees only the current versions of the replicas to be reconciled, together with an archive of the last state they had in common (perhaps saved away at the end of the last synchronization). A crucial problem faced by a state-based synchronizer is how to align the information in the current replicas and the archive, so that it can tell where changes have been made. This can be accomplished in a variety of ways, depending on the nature of the data being synchronized. Where the data is rigidly structured or where keys are available (e.g., in personal information management applications such as address books), the proper alignment is generally clear. For more ﬂexibly structured data, such as semistructured databases, ﬁle systems, and text documents, it is less clear how to reliably choose alignments that users consider natural. The issue is particularly vexing for pure textual (or, more generally, list-structured) data, which oﬀers no predetermined points of reference for alignment—the structures are presented to the synchronizer as ﬂat sequences of uninterpreted atoms (characters, V. Arvind and S. Prasad (Eds.): FSTTCS 2007, LNCS 4855, pp. 485–496, 2007. c Springer-Verlag Berlin Heidelberg 2007

486

S. Khanna, K. Kunal, and B.C. Pierce

words, or lines of text)—and for which common edits include arbitrary insertions, deletions, and rearrangements of existing material. The best known t

Data Loading...

A Formal Investigation of Diff3

Recommend Documents

Cognitive Reasoning A Formal Approach

Swarm Robotics: A Formal Approach

Formal Verification of a Certified Policy Language

Critique of Formal-Legalism

Formal Aspects of Symmetry

Systems of Formal Logic

Formal Groups

Formal Logic

Formal Construction

Formal Matrices

Formal Channels

A Formal Context for Symmetric Dependencies