IonCRAM: a reference-based compression tool for ion torrent sequence files
- PDF / 1,982,625 Bytes
- 16 Pages / 595.276 x 793.701 pts Page_size
- 0 Downloads / 159 Views
(2020) 21:397
SOFTWARE
Open Access
IonCRAM: a reference-based compression tool for ion torrent sequence files Moustafa Shokrof1 and Mohamed Abouelhoda2,3,4* * Correspondence: mabouelhoda@ yahoo.com 2 King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia 3 Saudi Human Genome Program, King Abdulaziz City for Science and Technology (KACST), Riyadh, Saudi Arabia Full list of author information is available at the end of the article
Abstract Background: Ion Torrent is one of the major next generation sequencing (NGS) technologies and it is frequently used in medical research and diagnosis. The built-in software for the Ion Torrent sequencing machines delivers the sequencing results in the BAM format. In addition to the usual SAM/BAM fields, the Ion Torrent BAM file includes technology-specific flow signal data. The flow signals occupy a big portion of the BAM file (about 75% for the human genome). Compressing SAM/BAM into CRAM format significantly reduces the space needed to store the NGS results. However, the tools for generating the CRAM formats are not designed to handle the flow signals. This missing feature has motivated us to develop a new program to improve the compression of the Ion Torrent files for long term archiving. Results: In this paper, we present IonCRAM, the first reference-based compression tool to compress Ion Torrent BAM files for long term archiving. For the BAM files, IonCRAM could achieve a space saving of about 43%. This space saving is superior to what achieved with the CRAM format by about 8–9%. Conclusions: Reducing the space consumption of NGS data reduces the cost of storage and data transfer. Therefore, developing efficient compression software for clinical NGS data goes beyond the computational interest; as it ultimately contributes to the overall cost reduction of the clinical test. The space saving achieved by our tool is a practical step in this direction. The tool is open source and available at Code Ocean, github, and http://ioncram.saudigenomeproject.com.
Background Ion Torrent is one of the widely used Next Generation Sequencing (NGS) technologies, with a market share of 20% (Research and Market Report 2016). This technology is particularly popular in the medical domain, because it is fast and cost effective. It is basically used for clinical gene panels and whole exome sequencing. Gene panels are used to read the sequences of selected genes to screen for variations related to some inherited disorders [1–5] and cancer [6, 7]. Whole exome sequencing covers the whole set of genes and is mostly used to identify novel mutations and genes [8–12]. The Ion Torrent technology is not favored for whole genome sequencing due to its limited throughput, which would lead to insufficient depth for clinical use. © The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the o
Data Loading...