Hardware BLAST Algorithms with Multi-seeds Detection and Parallel Extension

As one of the most widely used bio-sequence searching tools, BLAST adopts index-based approach to detect the matches between two substrings by looking up a large table and processing one match per query. In this paper, we propose a systolic array approach

  • PDF / 545,057 Bytes
  • 12 Pages / 430 x 660 pts Page_size
  • 94 Downloads / 184 Views

DOWNLOAD

REPORT


Abstract. As one of the most widely used bio-sequence searching tools, BLAST adopts index-based approach to detect the matches between two substrings by looking up a large table and processing one match per query. In this paper, we propose a systolic array approach to detect string matches without using looking up tables. The pipelining systolic array is implemented as a multi-seeds detection and parallel extension pipeline engine to accelerate the first two stages of NCBI BLAST algorithm. Different from the index-based approach, our implementation consumes little memory resources and eliminates redundant string extensions by merging multiple adjoin seeds into a valid seed. Our FPGA implementation achieves superior performance results in both of processing element number and clock frequency over related works in the area of FPGA BLAST accelerators. The experimental results also show the speedup can reach about 17 and 48 compared to the NCBI BLASTp and TBLASTn programs for 3072-residue queries on Intel P4 CPU, respectively. Furthermore, the idea of multi-seeds detection also can be adopted in other seed-based heuristic searching applications.

1

Introduction

The comparison of DNA or protein sequences has become a fundamental task of modern molecular biology. BLAST (Basic Local Alignment Search Tool) [1] as one of the most important tools has been designed to run on commodity PC clusters at present to search for sequence similarity in genomic databases. With the exponential growth of the bio-sequence databases, such as the NCBI (National Center for Biotechnology Information) GenBank [2], which has doubled in size every 12∼16 months for the last decade and now stands at over 56 billion characters, the computational requirements for sequence comparisons have far exceeded the computing capability. General-purposed microprocessors typically provide very limited bit-level parallelism. However, sequence comparison algorithms exhibit a much higher degree of bit-level data parallelism, typically hundreds of bit-level operations can be performed in parallel. Therefore, many researchers keen on implementing BLAST algorithms in hardware to avoid the low efficiency in general-purposed microprocessors. Recently, FPGA chips have emerged as one promising application accelerator, using a combination of FPGAs and general-purposed CPUs to R. Woods et al. (Eds.): ARC 2008, LNCS 4943, pp. 39–50, 2008. c Springer-Verlag Berlin Heidelberg 2008 

40

F. Xia, Y. Dou, and J. Xu

accelerate BLAST algorithm attracts much more attention. A number of parallel architectures have been proposed, such as Mercury BLASTn [3], [4], [5], Tree-BLAST [6], Mercury BLASTp [7], RC-BLAST [8], FPGA/FLASH Accelerator [9],Multi-engine BLASTn Accelerator [10], [11] and many commercialized system, BEE2 [12], CLC Cube [13], Mitrion [14] and DeCypher [15] et.al. have been built. Most of the current implementations adopt the index-based searching approach, which builds all kinds of tables to record the position of each word in query sequence, then drives the words (o