FFTC: Fastest Fourier Transform for the IBM Cell Broadband Engine
The Fast Fourier Transform (FFT) is of primary importance and a fundamental kernel in many computationally intensive scientific applications. In this paper we investigate its performance on the Sony-Toshiba-IBM Cell Broadband Engine, a heterogeneous multi
- PDF / 664,044 Bytes
- 13 Pages / 430 x 660 pts Page_size
- 91 Downloads / 190 Views
Abstract. The Fast Fourier Transform (FFT) is of primary importance and a fundamental kernel in many computationally intensive scientific applications. In this paper we investigate its performance on the SonyToshiba-IBM Cell Broadband Engine, a heterogeneous multicore chip architected for intensive gaming applications and high performance computing. The Cell processor consists of a traditional microprocessor (called the PPE) that controls eight SIMD co-processing units called synergistic processor elements (SPEs). We exploit the architectural features of the Cell processor to design an efficient parallel implementation of Fast Fourier Transform (FFT). While there have been several attempts to develop a fast implementation of FFT on the Cell, none have been able to achieve high performance for input series with several thousand complex points. We use an iterative out-of-place approach to design our parallel implementation of FFT with 1K to 16K complex input samples and attain a single precision performance of 18.6 GFLOP/s on the Cell. Our implementation beats FFTW on Cell by several GFLOP/s for these input sizes and outperforms Intel Duo Core (Woodcrest) for inputs of greater than 2K samples. To our knowledge we have the fastest FFT for this range of complex inputs.
1
Introduction
The Cell Broadband Engine (or the Cell/B.E.) [15,8,9,18] is a novel high-performance architecture designed by Sony, Toshiba, and IBM (STI), primarily targeting multimedia and gaming applications. The Cell BE consists of a traditional microprocessor (called the PPE) that controls eight SIMD coprocessing units called synergistic processor elements (SPEs), a high speed memory controller, and a high bandwidth bus interface (termed the element interconnect bus, or EIB), all integrated on a single chip. The Cell is used in Sony’s PlayStation 3 gaming console, Mercury Computer System’s dual Cellbased blade servers, and IBM’s QS20 Cell Blades. In this paper we present our design of an efficient parallel implementation of Fast Fourier Transform on the Cell Broadband Engine. FFT is of primary importance and a fundamental kernel in many computationally intensive scientific S. Aluru et al. (Eds.): HiPC 2007, LNCS 4873, pp. 172–184, 2007. c Springer-Verlag Berlin Heidelberg 2007
FFTC: Fastest Fourier Transform for the IBM Cell Broadband Engine
173
applications such as computer tomography, data filtering and fluid dynamics. Another important application area of FFTs is in spectral analysis of speech, sonar, radar, seismic and vibration detection. FFTs are also used in digital filtering, signal decomposition, and in solution of partial differential equations. The performance of these applications rely heavily on the availability of a fast routine for Fourier transforms. The literature contains several publications related to FFTs on the Cell/B.E. processor. Williams et al. [19] analyze the Cell’s peak performance for FFT of various types (1D, 2D), accuracy (single, double precision) and input sizes. Cico, Cooper and Greene [7] estimate the performance of
Data Loading...