Works (4)

Updated: April 11th, 2023 10:13

2019 journal article

Coordinated CTA Combination and Bandwidth Partitioning for GPU Concurrent Kernel Execution

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 16(3).

By: Z. Lin, H. Dai, M. Mantor* & H. Zhou

author keywords: GPGPU; TLP; bandwidth management; concurrent kernel execution
Sources: Web Of Science, ORCID
Added: December 2, 2019

2016 article

A Model-Driven Approach to Warp/Thread-Block Level GPU Cache Bypassing

2016 ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC).

By: H. Dai, C. Li, H. Zhou, S. Gupta*, C. Kartsaklis* & M. Mantor*

Sources: Web Of Science, ORCID
Added: August 6, 2018

2015 conference paper

Analyzing graphics processor unit (GPU) instruction set architectures

Ieee international symposium on performance analysis of systems and, 155–156.

By: K. Mayank n, H. Dai, J. Wei* & Huiyang

Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2014 conference paper

Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs

Ieee international symposium on performance analysis of systems and, 231–241.

By: C. Li, Y. Yang, H. Dai, S. Yan, F. Mueller & H. Zhou

Source: NC State University Libraries
Added: August 6, 2018