2021 conference paper
QPR: Quantizing PageRank with Coherent Shared Memory Accelerators
IEEE Intl. Parallel and Distributed Processing Symposium, 962–972.
Graph algorithms often require fine-grained, random access across substantially large data structures. Previous work on FPGA-based acceleration has required significant preprocessing and restructuring to transform the memory access patterns into a streaming format that is more friendly to of fchip hardware. However, the emergence of cache-coherent shared memory interfaces, such as CAPI, allows designers to more easily work with the natural in-memory organization of the data. This paper introduces a vertex-centric shared-memory accelerator for the PageRank algorithm, optimized for high performance while effectively using coherent caching on the FPGA hardware. The proposed design achieves up to 14.9x speedups by selectively caching graph data for the accelerator while taking into account locality and reuse, compared to naively using the shared address space access and DRAM only. We also introduce PageRank Quantization, an innovative technique to represent page-ranks with 32-bit quantized fixed-point values. This approach is up to 1.5x faster than 64-bit fixed-point while keeping precision within a tolerable error margin. As a result, we maintain both the hardware scalability of fixed-point representation and the cache performance of 32-bit floating-point.