DNA Data Storage in Perl

  • PDF / 1,602,996 Bytes
  • 9 Pages / 609 x 794 pts Page_size
  • 93 Downloads / 251 Views

DOWNLOAD

REPORT


pISSN 1226-8372 eISSN 1976-3816

RESEARCH PAPER

DNA Data Storage in Perl Ui Jin Lee, Seulki Hwang, Kyoon Eon Kim, and Moonil Kim

Received: 17 January 2020 / Revised: 26 February 2020 / Accepted: 28 February 2020 © The Korean Society for Biotechnology and Bioengineering and Springer 2020

Abstract Here we report a simple and flexible method for DNA data storage based on Perl script. For this approach, the text data of the preamble of the “Universal Declaration of Human Rights” consisting of 2,046 words was encoded into the corresponding 8,148 base pairs of DNA using Perl-based encoding with a hash table. The encoded DNA sequences were then artificially synthesized for storage. The information DNA consisted of a total of 22 chemically synthesized DNA fragments with 400 nucleotides each, which were inserted into a cloning vector to multiply the plasmid DNA. The nucleotide integrity of the data-carrying DNA sequences were ensured under the accelerated aging conditions. Also, an erroneous nucleotide in the information DNA sequences was successfully corrected using the overlap extension PCR method. The stored DNA was read by sequencing, and the resulting DNA sequence information was successfully decoded to convert the DNA records back to the original document. Our results indicate that textual data can be stored in DNA using a simple, easy, and flexible Perl by running a script from the command line. Keywords: DNA, DNA data storage, Perl, overlap Ui Jin Lee†, Seulki Hwang†, Moonil Kim* Bionanotechnology Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Korea Tel: +82-42-879-8447; Fax: +82-42-879-8594 E-mail: [email protected] Ui Jin Lee, Kyoon Eon Kim Department of Biochemistry, College of Natural Sciences, Chungnam National University, Daejeon 34134, Korea Seulki Hwang, Moonil Kim Department of Nanobiotechnology, KRIBB School of Biotechnology, Korea University of Science and Technology (UST), Daejeon 34113, Korea †

These authors contributed equally to this work.

extension PCR

1. Introduction The demand for data storage is rapidly increasing, and there is significant interest in exploring alternative storage media [1,2]. In this context, much attention has been focused on DNA data storage, which can store large amounts of data in a small amount of DNA. This biological medium uses 4 types of DNA bases; adenine (A), guanine (G), cytosine (C), and thymine (T). These bases are molecules that contain the genetic information that make each species unique. The biological instructions are passed from parents to offspring. Thus, DNA molecule must be copied to produce two identical DNA molecules, so called replication. Also, DNA must be stable to maintain its genetic message for a long time. With these structural and functional characteristics of DNA, the data information can be encoded in, and decoded from DNA. DNA molecules are combined in various encoding methods to express the original data [3,4]. DNA-based data storage system involves encoding data into DNA sequences, synthe