ScrimpCo: scalable matrix profile on commodity heterogeneous processors
- PDF / 2,398,654 Bytes
- 22 Pages / 439.37 x 666.142 pts Page_size
- 67 Downloads / 149 Views
ScrimpCo: scalable matrix profile on commodity heterogeneous processors Jose C. Romero1 · Antonio Vilches2 · Andrés Rodríguez1 · Angeles Navarro1 · Rafael Asenjo1
© Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract The discovery of time series motifs and discords is considered a paramount and challenging problem regarding time series analysis. In this work, we present ScrimpCo, a heterogeneous implementation of a previous algorithm called SCRIMP that excels at finding relevant subsequences in time series. We propose and evaluate several static, dynamic and adaptive partition strategies targeting commodity processors, on both homogeneous (CPU multicore) and heterogeneous (CPU + GPU) architectures. For the CPU + GPU implementation, we explore a heterogeneous parallel_reduce pattern that computes part of the computation onto an OpenCL capable GPU, whereas the CPU cores take care of the other part. Our heterogeneous scheduler, built on top of TBB, pays special attention to appropriately balance the computational load among the GPU and CPU cores. The experimental results show that our homogeneous implementation scales linearly and that our heterogeneous implementation allows us to reach near-ideal performance on commodity processors that feature an on-chip GPU. Keywords Time series analytics · TBB · OpenCL · SCRIMP · Matrix profile · CPU + GPU chips · Heterogeneous computing · Energy consumption * Rafael Asenjo [email protected] Jose C. Romero [email protected] Antonio Vilches [email protected] Andrés Rodríguez [email protected] Angeles Navarro [email protected] 1
Department of Computer Architecture, University of Málaga, 29071 Málaga, Spain
2
Shapelets, Puerta del Mar 18, 2nd Floor, 29005 Málaga, Spain
13
Vol.:(0123456789)
J. C. Romero et al.
1 Introduction The discovery of similar subsequences (motifs) or anomalies (discords) in large time series is a computationally expensive problem with applicability in many fields, such as seismology [1], power demand [2], neuroscience [3], or entomology [4]. A recently proposed solution [5] consists in first computing a score for each subsequence in the time series, resulting in another time series called matrix profile. By a simple inspection of the matrix profile, it is straightforward to identify the motifs and discords by focusing on minimum and maximum values, respectively. Several algorithms have been proposed to compute the matrix profile: STAMP [5], STOMP [6] or the more advanced SCRIMP [7] which exhibits a higher degree of parallelism. Also, different versions of SCRIMP have been adapted to target shared-memory multiprocessors, distributed-memory computers [8], multi-GPU platforms [9], and Intel Xeon Phi KNL processors [10]. In this work, we pay particular attention to the load unbalance problem posed by the SCRIMP algorithm, where each parallel iteration has a different computational workload. As we explain later, this is a consequence of the computational pattern which follows a diagonal traverse of a triangular matri
Data Loading...