Works (4)

Updated: July 5th, 2023 15:37

2019 article

Coordinated CTA Combination and Bandwidth Partitioning for GPU Concurrent Kernel Execution

Lin, Z., Dai, H., Mantor, M., & Zhou, H. (2019, June 17). ACM Transactions on Architecture and Code Optimization, Vol. 16.

By: Z. Lin n, H. Dai n, M. Mantor* & H. Zhou n

author keywords: GPGPU; TLP; bandwidth management; concurrent kernel execution
topics (OpenAlex): Parallel Computing and Optimization Techniques; Interconnection Networks and Systems; Advanced Data Storage Technologies
TL;DR: A coordinated approach for CTA combination and bandwidth partitioning that dynamically detects co-running kernels as latency sensitive or bandwidth intensive and allocates more CTA resources for latency-sensitive kernels and more NoC/DRAM bandwidth resources to NoC-/DRam-intensive kernels. (via Semantic Scholar)
Sources: Web Of Science, NC State University Libraries
Added: December 2, 2019

2016 article

A model-driven approach to warp/thread-block level GPU cache bypassing

Dai, H., Li, C., Zhou, H., Gupta, S., Kartsaklis, C., & Mantor, M. (2016, May 25). 2016 53rd Acm/Edac/Ieee Design Automation Conference (Dac).

By: H. Dai n, C. Li n, H. Zhou n, S. Gupta*, C. Kartsaklis* & M. Mantor*

topics (OpenAlex): Parallel Computing and Optimization Techniques; Advanced Data Storage Technologies; Cloud Computing and Resource Management
TL;DR: This paper proposes a simple yet effective performance model to estimate the impact of cache contention and resource congestion as a function of the number of warps/thread blocks to bypass the cache, and designs a hardware-based dynamic warp/thread-block level GPU cache bypassing scheme. (via Semantic Scholar)
Sources: Web Of Science, NC State University Libraries
Added: August 6, 2018

2015 article

Analyzing graphics processor unit (GPU) instruction set architectures

Mayank, K., Dai, H., Wei, J., & Zhou, H. (2015, March 1). Ieee International Symposium on Performance Analysis of Systems And, pp. 155–156.

By: K. Mayank n, H. Dai n, J. Wei* & H. Zhou n

topics (OpenAlex): Parallel Computing and Optimization Techniques; Interconnection Networks and Systems; Embedded Systems Design Techniques
TL;DR: There are few studies and analyses on GPU instruction set architectures (ISAs) although it is wellknown that the ISA is a fundamental design issue of all modern processors including GPUs. (via Semantic Scholar)
UN Sustainable Development Goals Color Wheel
UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Sources: NC State University Libraries, NC State University Libraries
Added: August 6, 2018

2014 conference paper

Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs

Ieee international symposium on performance analysis of systems and, 231–241.

By: C. Li, Y. Yang, H. Dai, S. Yan, F. Mueller & H. Zhou

Source: NC State University Libraries
Added: August 6, 2018

Citation Index includes data from a number of different sources. If you have questions about the sources of data in the Citation Index or need a set of data which is free to re-distribute, please contact us.

Certain data included herein are derived from the Web of Science© and InCites© (2026) of Clarivate Analytics. All rights reserved. You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.