2020 journal article

Fair and cache blocking aware warp scheduling for concurrent kernel execution on GPU

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 112, 1093–1105.

By: C. Zhao*, W. Gao*, F. Nie*, F. Wang* & H. Zhou n

co-author countries: China 🇨🇳 United States of America 🇺🇸
author keywords: GPU; Concurrent kernels; Warp scheduling; Cache blocking; Interference
Source: Web Of Science
Added: September 28, 2020

With Graphic Processing Units (GPUs) being widely adopted in data centers to provide computing power, efficient support for GPU multitasking attracts significant attention. The prior GPU multitasking works include spatial multitasking and simultaneous multitasking (SMK). Spatial multitasking allocates GPU resources at the streaming multiprocessor (SM) granularity which is coarse-grained, and SMK runs concurrent kernels on the same SM, therefore is fine-grained. SMK is beneficial to improve GPU resource utilization especially when concurrent kernels have complementary characteristics. However, the main challenge for SMK is the interference among multiple kernels especially the contention on data cache. In this paper, we propose a fair and cache blocking aware warp scheduling (FCBWS) approach to ameliorate the contention on data cache and improve SMK on GPUs. In FCBWS, equal opportunity of issuing instructions is provided to each kernel, and memory pipeline stalls are tried to be avoided by predicting cache blocking. Kernels are extracted from various applications to construct concurrent kernel execution benchmarks. The simulation experiment results show that FCBWS outperforms previous multitasking methods; even compared to the state-of-the-art SMK method, FCBWS can improve system throughput (STP) by 10% on average and reduce average normalized turnaround time (ANTT) by 41% on average.