VerCoLib: Fast and Versatile Communication for FPGAs via PCI Express
- PDF / 1,628,297 Bytes
- 10 Pages / 595.224 x 790.955 pts Page_size
- 33 Downloads / 171 Views
VerCoLib: Fast and Versatile Communication for FPGAs via PCI Express 1 · Joachim K. Anlauf1 ˘ ¨ Oguzhan Sezenlik1 · Sebastian Schuller
Received: 10 September 2018 / Revised: 6 March 2019 / Accepted: 25 June 2019 © Springer Science+Business Media, LLC, part of Springer Nature 2019
Abstract PCI Express plays a vital role in including FPGA accelerators into high-performance computing systems. This also includes direct communication between multiple FPGAs, without any involvement of the main memory of the host. We present a highly configurable hardware interface that supports DMA-based connections to a host system as well as direct communication between multiple FPGAs. Our implementation offers unidirectional channels to connect FPGAs, allowing for precise adaptation to all kinds of use cases. Multiple channels to the same endpoint can be used to realise independent data transmissions. While the main focus of this work is flexibility, we are able to show maximum throughput for connections between two FPGAs and up to 92% of the available bandwidth for connections between the FPGA and the host system. Keywords VerCoLib · FPGA · PCI Express · Transceiver
1 Introduction FPGAs are widely used to accelerate state-of-the-art algorithms or as co-processors in heterogeneous high performance computer systems. FPGA vendors offer affordable evaluation boards with high-end FPGAs, especially popular in academic research. Through the development of high level synthesis tools like Xilinx Vivado HLS or intelFPGA OpenCL, FPGAs became a more accessible and viable platform. While writing code for the FPGA accelerator itself is one part of a design another important factor is to utilize its full performance by transferring data reliably and with sufficient throughput. Here a common high-bandwidth interface like PCI Express (PCIe in the following) is essential, the use of which requires fundamental knowledge about its protocol, underlying computer hardware as well as kernel driver programming. Therefore developers often face O˘guzhan Sezenlik and Sebastian Sch¨uller contributed equally to this work. O˘guzhan Sezenlik
[email protected] Sebastian Sch¨uller [email protected] 1
Intstitute of Computer Science VI, Technische Informatik, University of Bonn, Endenicher Allee 19 A, 53115 Bonn, Germany
the problem to implement the required complex logic to interface the PCIe IP core provided by the vendor and integrate access to the accelerator into their software. Furthermore modern mainboards allow devices to communicate directly via PCIe, completely bypassing the main memory, a feature heavily used to connect multiple GPUs and build low-cost supercomputers for various scientific applications. The same idea also applies to FPGAs: one could simply plug several off-the-shelf FPGA boards into one standard desktop computer to improve the computational power. Such a feature is especially useful, since combining several smaller FPGAs is more cost efficient than using high-end variants. Our goal is to provide a highly configurable and
Data Loading...