Works (4)
2019 article
Coordinated CTA Combination and Bandwidth Partitioning for GPU Concurrent Kernel Execution
Lin, Z., Dai, H., Mantor, M., & Zhou, H. (2019, June 17). ACM Transactions on Architecture and Code Optimization, Vol. 16.
2016 article
A model-driven approach to warp/thread-block level GPU cache bypassing
Dai, H., Li, C., Zhou, H., Gupta, S., Kartsaklis, C., & Mantor, M. (2016, May 25). 2016 53rd Acm/Edac/Ieee Design Automation Conference (Dac).
2015 article
Analyzing graphics processor unit (GPU) instruction set architectures
Mayank, K., Dai, H., Wei, J., & Zhou, H. (2015, March 1). Ieee International Symposium on Performance Analysis of Systems And, pp. 155–156.
2014 conference paper
Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs
Ieee international symposium on performance analysis of systems and, 231–241.