2022 article

FAXID: FPGA-Accelerated XGBoost Inference for Data Centers using HLS

2022 IEEE 30TH INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2022), pp. 113–121.

By: A. Gajjar n, P. Kashyap n, A. Aysu n, P. Franzon n, S. Dey* & C. Cheng*

TL;DR: An FPGA-based XGBoost accelerator designed with High-Level Synthesis (HLS) tools and design flow accelerating binary classification inference is showcased, showing a latency speedup of the proposed design over state-of-art CPU and GPU implementations, including energy efficiency and cost-effectiveness. (via Semantic Scholar)
UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Source: Web Of Science
Added: October 11, 2022

Advanced ensemble trees have proven quite effective in providing real-time predictions against ransomware detection, medical diagnosis, recommendation engines, fraud detection, failure predictions, crime risk, to name a few. Especially, XGBoost, one of the most prominent and widely used decision trees, has gained popularity due to various optimizations on gradient boosting framework that provides increased accuracy for classification and regression problems. XGBoost’s ability to train relatively faster, handling missing values, flexibility and parallel processing make it a better candidate to handle data center workload. Today’s data centers with enormous Input/Output Operations per Second (IOPS) demand a real-time accelerated inference with low latency and high throughput because of significant data processing due to applications such as ransomware detection or fraud detection.This paper showcases an FPGA-based XGBoost accelerator designed with High-Level Synthesis (HLS) tools and design flow accelerating binary classification inference. We employ Alveo U50 and U200 to demonstrate the performance of the proposed design and compare it with existing state-of-the-art CPU (Intel Xeon E5-2686 v4) and GPU (Nvidia Tensor Core T4) implementations with relevant datasets. We show a latency speedup of our proposed design over state-of-art CPU and GPU implementations, including energy efficiency and cost-effectiveness. The proposed accelerator is up to 65.8x and 5.3x faster, in terms of latency than CPU and GPU, respectively. The Alveo U50 is a more cost-effective device, and the Alveo U200 stands out as more energy-efficient.