2019 journal article
Coordinated CTA Combination and Bandwidth Partitioning for GPU Concurrent Kernel Execution
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 16(3).
A Model-Driven Approach to Warp/Thread-Block Level GPU Cache Bypassing
2016 ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC).
2015 conference paper
Analyzing graphics processor unit (GPU) instruction set architectures
Ieee international symposium on performance analysis of systems and, 155–156.
2014 conference paper
Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs
Ieee international symposium on performance analysis of systems and, 231–241.