High performance parallel KMP algorithm on a heterogeneous architecture

  • PDF / 2,871,197 Bytes
  • 13 Pages / 595.276 x 790.866 pts Page_size
  • 64 Downloads / 229 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789(). ,- volV)

High performance parallel KMP algorithm on a heterogeneous architecture Neungsoo Park1 • Soeun Park1 • Myungho Lee2 Received: 20 February 2019 / Revised: 4 August 2019 / Accepted: 14 August 2019 Ó Springer Science+Business Media, LLC, part of Springer Nature 2019

Abstract String matching algorithm is widely used in many application areas such as bio-informatics, network intrusion detection, computer virus scan, among many others. Knuth–Morris–Pratt (KMP) algorithm is commonly used for its fast execution time compared with many other string matching algorithms when applied to large input texts. However, the performance of the KMP algorithm is limited when the input text size increases significantly beyond a certain limit. In this paper, we propose a high performance parallel KMP algorithm on the Heterogeneous High Performance Computing (HPC) architecture based on the general-purpose multicore microprocessor and the manycore graphic processing unit (GPU) using OpenCL as the General Purpose computing using Graphic Processing Unit (GPGPU) platform. The proposed parallel KMP algorithm mainly focuses on optimizing the CPU–GPU memory hierarchy by overlapping the data transfer between the CPU memory and the GPU memory with the string matching operations on the GPU. It also optimizes the allocation of the work-groups and the work-items, and places the pattern data and the failure table in the on-chip local memory of the GPU. Exploiting multiple GPUs further the performance improvements. The experimental results show that the optimized parallel KMP algorithm leads up to 15.25 times speedup compared with the version before our optimization techniques were applied. Keywords KMP algorithm  GPGPU  OpenCL  Heterogeneous computing  Computation communication overlapping

1 Introduction String matching algorithms are commonly used in various applications such as the keyword matching for the given input text, the intrusion detection in the network system [1, 2], the human genome sequence matching for the bioinformatics [3], among many others [4, 5]. The Knuth– Morris–Pratt (KMP) algorithm is superior to other similar pattern matching algorithms for its fast execution time & Myungho Lee [email protected] Neungsoo Park [email protected] Soeun Park [email protected] 1

Department of Computer Science & Engineering, Konkuk University, Seoul, Korea

2

Department of Computer Science & Engineering, Myongji University, Yongin, Kyungkido, Korea

when it is applied to large sized input texts and reference patterns. However, when the input text size increases significantly beyond a certain limit such as in the genome sequence matching, the execution time also increases accordingly [6]. Therefore, it is crucial to minimize the execution time by efficiently parallelizing the KMP algorithm [7]. Recently, a heterogeneous high performance computing (HPC) architectures based on general-purpose multicore microprocessors and the manycore graphic processing units (GPUs) are becoming increasingly po