On conflict free DNA codes

  • PDF / 810,842 Bytes
  • 29 Pages / 439.642 x 666.49 pts Page_size
  • 24 Downloads / 224 Views

DOWNLOAD

REPORT


On conflict free DNA codes Krishna Gopal Benerjee1 · Sourav Deb1 · Manish K. Gupta1 Received: 28 October 2019 / Accepted: 24 September 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract DNA storage has emerged as an important area of research. The reliability of a DNA storage system depends on designing those DNA strings (called DNA codes) that are sufficiently dissimilar. In this work, we introduce DNA codes that satisfy the newly introduced constraint, a generalization of the non-homopolymers constraint. In particular, each codeword of the DNA code has the specific property that any two consecutive sub-strings of the DNA codeword will not be the same. This is apart from the usual constraints such as Hamming, reverse, reverse-complement and GC-content. We believe that the new constraints proposed in this paper will provide significant achievements in reducing the errors, during reading and writing data into the synthetic DNA strings. We also present a construction (based on a variant of stochastic local search algorithm) to determine the size of the DNA codes with a constraint that each DNA codeword is free from secondary structures in addition to the usual constraint. This further improves the lower bounds from the existing literature, in some specific cases. A recursive isometric map between binary vectors and DNA strings is also proposed. By applying this map over the well known binary codes, we obtain classes of DNA codes with all of the above constraints, including the property that the constructed DNA codewords are free from the hairpin like secondary structures. Keywords DNA codes · Homopolymers · Conflict free DNA strings · Hamming constraint · Reverse constraint · Reverse-complement constraint · GC-content constraint · Hairpin like secondary structures Mathematics Subject Classification (2010) 68P30 · 94B60 · 94B65

The preliminary version of the paper is available at https://arxiv.org/abs/1902.04419  Manish K. Gupta

[email protected] Krishna Gopal Benerjee [email protected] Sourav Deb sourav [email protected] 1

Laboratory of Natural Information Processing, Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar, Gujarat, India

Cryptography and Communications

1 Introduction The exponentially increasing demand in data storage forces to look into every possible option and DNA (DeoxyriboNucleic Acid) data storage has come out to be one of the most promising natural data storage for this purpose [11]. After the first striking implementation of large-scale archival DNA-based storage architecture by Church et al. [5] in 2012, followed by encoding scheme to DNA proposed by Goldman et al. [8] in 2013, researchers have taken great interests on the construction of DNA-based information storage systems [18, 20] because of its high storage density and longevity [5, 8, 36]. DNA consists of four types of bases or nucleotides (nts) called adenine (A), cytosine (C), guanine (G) and thymine (T ) where, the Watson-Cricks complementary bases for A and C