Speeding up inference on deep neural networks for object detection by performing partial convolution

  • PDF / 3,309,626 Bytes
  • 17 Pages / 595.276 x 790.866 pts Page_size
  • 28 Downloads / 164 Views

DOWNLOAD

REPORT


ORIGINAL RESEARCH PAPER

Speeding up inference on deep neural networks for object detection by performing partial convolution Wattanapong Kurdthongmee1 Received: 29 November 2018 / Accepted: 3 September 2019 © Springer-Verlag GmbH Germany, part of Springer Nature 2019

Abstract Real-time object detection is an expected application of deep neural networks (DNNs). It can be achieved by employing graphic processing units (GPUs) or dedicated hardware accelerators. Alternatively, in this work, we present a software scheme to accelerate the inference stage of DNNs designed for object detection. The scheme relies on partial processing within the consecutive convolution layers of a DNN. It makes use of different relationships between the locations of the components of an input feature, an intermediate feature representation, and an output feature to effectively identify the modified components. This downsizes the matrix multiplicand to cover only those modified components. Therefore, matrix multiplication is accelerated within a convolution layer. In addition, the aforementioned relationships can also be employed to signal the next consecutive convolution layer regarding the modified components. This further helps reduce the overhead of the comparison on a member-by-member basis to identify the modified components. The proposed scheme has been experimentally benchmarked against a similar concept approach, namely, CBinfer, and against the original Darknet on the Tiny-You Only Look Once network. The experiments were conducted on a personal computer with dual CPU running at 3.5 GHz without GPU acceleration upon video data sets from YouTube. The results show that improvement ratios of 1.56 and 13.10 in terms of detection frame rate over CBinfer and Darknet, respectively, are attainable on average. Our scheme was also extended to exploit GPU-assisted acceleration. The experimental results of NVIDIA Jetson TX2 reached a detection frame rate of 28.12 frames per second (1.25× with respect to CBinfer). The accuracy of detection of all experiments was preserved at 90% of the original Darknet. Keywords  Deep neural networks · DNNs object detection · Convolution · Inference acceleration

1 Introduction Object detection helps understand images and videos; it not only renders a classification result, but also estimates the location of objects contained in each image or video frame [1]. This is beneficial for many applications such as image classification, human behaviour analysis, face recognition, and autonomous driving [2]. Many traditional object detection approaches have been employed thus far, including Histogram of Oriented Gradient (HOG) [3], Support Vector Machine (SVM) [4], and neural networks [5]. Deep Neural Networks (DNNs) have emerged as a powerful tool for * Wattanapong Kurdthongmee [email protected] 1



School of Engineering and Technology, Walailak University, 222 Thaibury, Tha‑sa‑la, Nakhon‑si‑thammarat 80160, Thailand

object detection. DNNs are inherently beneficial from a twofold perspective. First, they have deepe