SMURF-seq: efficient copy number profiling on long-read sequencers

  • PDF / 1,615,996 Bytes
  • 9 Pages / 595 x 791 pts Page_size
  • 36 Downloads / 186 Views

DOWNLOAD

REPORT


M ET HO D

Open Access

SMURF-seq: efficient copy number profiling on long-read sequencers Rishvanth K. Prabakar1 , Liya Xu2 , James Hicks2 and Andrew D. Smith1*

Abstract We present SMURF-seq, a protocol to efficiently sequence short DNA molecules on a long-read sequencer by randomly ligating them to form long molecules. Applying SMURF-seq using the Oxford Nanopore MinION yields up to 30 fragments per read, providing an average of 6.2 and up to 7.5 million mappable fragments per run, increasing information throughput for read-counting applications. We apply SMURF-seq on the MinION to generate copy number profiles. A comparison with profiles from Illumina sequencing reveals that SMURF-seq attains similar accuracy. More broadly, SMURF-seq expands the utility of long-read sequencers for read-counting applications. Keywords: Long-read sequencing, Nanopore sequencing, Copy number variation, Read-counting applications

Background In the last decade, massively parallel high-throughput short-read sequencing has revolutionized the efficiency and breadth of applications for DNA sequencing [1]. These high-throughput sequencing methods produce millions to billions of short reads in a single run and have led to the development of many applications that depend on “read-counting” to measure the abundance of specific sequences in a sample. Examples include RNA-seq, ChIPseq, and whole genome copy number profiling. Recently, long-read technologies have been developed that are filling the gap left by short-read sequencers in applications such as genome assembly [2, 3], which benefit from connecting more distant sequences within a contiguous molecule. Among these, the MinION instrument, from Oxford Nanopore Technologies, is highly portable and inexpensive and has shown its unique value for analysis outside of central sequencing facilities [4]. Long-read sequencers such as the MinION typically produce vastly fewer reads from a sequencing run and are therefore less efficient in applications that use sequenced reads purely as a means to count molecules. However, these technologies have the enormous advantage of operating in near

*Correspondence: [email protected] Quantitative and Computational Biology Section, Department of Biological Sciences, University of Southern California, 1050 Childs Way, Los Angeles 90089, USA Full list of author information is available at the end of the article

1

real-time, with a turnaround time that can be measured in hours for some applications, rather than days or weeks. Copy number variation (CNV) has been used successfully to understand a variety of diseases [5]—notably cancers, which exhibit both extreme variation and recurrent trends that can be used for diagnostics and personalized approaches to treatment. For example, the amplification and loss of certain genes, such as RB1 deletion and MYCN amplification in retinoblastoma, can be prognostic or even predictive for treatment [6]. High-throughput short-read sequencing has been extremely effective in copy number profiling of cancers [7], including pro