Sequencing barcode construction and identification methods based on block error-correction codes

PDF / 1,951,909 Bytes
13 Pages / 595.276 x 793.701 pts Page_size
67 Downloads / 503 Views

quencing barcode construction and identification methods based on block error-correction codes 1*

1

2

1

2

Weigang Chen , Lixia Wang , Mingzhe Han , Changcai Han & Bingzhi Li 1

2

School of Microelectronics, Tianjin University, Tianjin 300072, China; School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China

Received December 23, 2019; accepted February 11, 2020; published online April 14, 2020

Multiplexed sequencing relies on specific sample labels, the barcodes, to tag DNA fragments belonging to different samples and to separate the output of the sequencers. However, the barcodes are often corrupted by insertion, deletion and substitution errors introduced during sequencing, which may lead to sample misassignment. In this paper, we propose a barcode construction method, which combines a block error-correction code with a predetermined pseudorandom sequence to generate a base sequence for labeling different samples. Furthermore, to identify the corrupted barcodes for assigning reads to their respective samples, we present a soft decision identification method that consists of inner decoding and outer decoding. The inner decoder establishes the hidden Markov model (HMM) for base insertion/deletion estimation with the pseudorandom sequence, and adapts the forward-backward (FB) algorithm to output the soft information of each bit in the block code. The outer decoder performs soft decision decoding using the soft information to effectively correct multiple errors in the barcodes. Simulation results show that the proposed methods are highly robust to high error rates of insertions, deletions and substitutions in the barcodes. In addition, compared with the inner decoding algorithm of the barcodes based on watermarks, the proposed inner decoding algorithm can greatly reduce the decoding complexity. DNA sequencing barcode, insertion/deletion errors, hidden Markov model (HMM), forward-backward algorithm Citation:

Chen, W., Wang, L., Han, M., Han, C., and Li, B. (2020). Sequencing barcode construction and identification methods based on block error-correction codes. Sci China Life Sci 63, https://doi.org/10.1007/s11427-019-1651-3

INTRODUCTION Next-generation sequencing is revolutionizing molecular biology research, pathogen identification and drug discovery due to steadily improving sequencing capacity and decreasing costs (Cao et al., 2019; Hardwick et al., 2017; Jin et al., 2019; Li et al., 2018; Liu et al., 2019). With the development of second generation sequencing, the sequencing throughput of third generation sequencing technologies for long reads is also increasing. For example, for the Pacific Biosciences (PacBio) Sequel system, each sequencing unit, i.e., SMRT cell, can generate 3–5 Gb of data (Ardui et al., 2018). Oxford Nanopore Technology (ONT) recently an-

nounced that PromethION 48 sequencers have achieved 7.6 Tb of data with 48 flow cells. The sequencing throughput of a single flow cell can reach 162 Gb (Eisenstein, 2019). However, many users do not need this sequencing

Data Loading...

Sequencing barcode construction and identification methods based on block error-correction codes

Recommend Documents

Joint Source-Cryptographic-Channel Coding Based on Linear Block Codes

Space-Time Block Codes

Convolutional codes: techniques of construction

On Construction and Identification of Graphs

Spatial Block Codes Based on Unitary Transformations Derived from Orthonormal Polynomial Sets

Construction of isodual codes from polycirculant matrices

Biometric Identification Technologies Based on Modern Data Mining Methods

MDS Constacyclic Codes of Prime Power Lengths Over Finite Fields and Construction of Quantum MDS Codes

Mathematical Construction Methods

About Burst Decoding for Block-Permutation LDPC Codes

A Fast Sphere Decoding Algorithm for Space-Frequency Block Codes

Writer Identification Using Differential Chain Codes and Grid Features