Reconstruction of a Word from a Finite Set of its Subwords under the unit Shift Hypothesis. I. Reconstruction without fo

  • PDF / 127,282 Bytes
  • 9 Pages / 594 x 792 pts Page_size
  • 49 Downloads / 144 Views

DOWNLOAD

REPORT


RECONSTRUCTION OF A WORD FROM A FINITE SET OF ITS SUBWORDS UNDER THE UNIT SHIFT HYPOTHESIS. I. RECONSTRUCTION WITHOUT FORBIDDEN WORDS1 Y. G. Smetanina and M. V. Ulyanov b

UDC 519.16, 519.17

Abstract. The problem of reconstruction of a word from a set of its subwords is considered. It is assumed that the set is generated by unit shifts of a fixed window along an unknown word. For the problem without constrains on the unknown word, a method of reconstruction is proposed based on the search for Euler paths or Euler cycles in a de Bruijn multidigraph. The search is based on symbolic multiplication of adjacency matrices with special operations of multiplication and addition of edge names. The method makes it possible to find reconstructed words and the number of reconstructions. Keywords: reconstruction of words, graph, Euler path, enumeration of paths, reconstruction without forbidden words, de Bruijn graph, symbolic matrix multiplication. INTRODUCTION. APPLICATION DOMAINS This paper considers the problem whose problematic pertains to combinatorics on words, which is a new modern subdiscipline of discrete mathematics. Combinatorics on words is a term appeared approximately thirty years ago for uniting lines of investigation related by common approaches but considered in different areas of mathematics from number theory to theoretical informatics. The main objective of investigations in this area is the study of words as independent objects from the viewpoint of their internal structure. Combinatorics on words covers combinatorial methods of analysis of sets of words used in the theory of formal languages and finite-state machines [1], theory of groups [2], chaos theory [3], fractal analysis [4], symbolic dynamics [5], and analysis of time series, bioinformatics [6] (in which combinatorial methods were applied by G. Gamow [7] as long ago as 1957), linguistics, etc. Progress in this area is described in regularly appeared books published under the pseudonym M. Lothaire by a group of mathematicians [8–10]. The objects being considered in combinatorics on words are words over arbitrary alphabets, and the subject of investigation is the study of combinatorial properties of different sets of finite and infinite words. In applied real-world problems, information on words often turns out to be incomplete; for example, such a situation is inevitable in analyzing infinite time series measured during finite time intervals. The present article considers one of statements of the problem of reconstruction of words from their known successive fragments, i.e., subwords. Problems of reconstruction of words are closely related, for example, to problems of coding [11] and pattern recognition [12], and they also arise in some other domains. Information coding and pattern recognition are, in a sense, limiting cases of the problem of reconstruction of words since the code construction problem may 1

This work was supported by the Russian Foundation for Basic Research under grant No. 13-07-00516. a

Dorodnicyn Computing Center of RAS and M