Zhen Lin Lin, Z., Dai, H., Mantor, M., & Zhou, H. (2019). Coordinated CTA Combination and Bandwidth Partitioning for GPU Concurrent Kernel Execution. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 16(3). https://doi.org/10.1145/3326124 Lin, Z., Alshboul, M., Solihin, Y., & Zhou, H. (2019). Exploring Memory Persistency Models for GPUs. 28th International Conference on Parallel Architectures and Compilation Techniques (PACT), 310–322. https://doi.org/10.1109/PACT.2019.00032 Lin, Z., Mathur, U., & Zhou, H. (2019). Scatter-and-Gather Revisited: High-Performance Side-Channel-Resistant AES on GPUs. 12TH WORKSHOP ON GENERAL PURPOSE PROCESSING USING GPUS (GPGPU 12), pp. 2–11. https://doi.org/10.1145/3300053.3319415 Dai, H., Lin, Z., Li, C., Zhao, C., Wang, F., Zheng, N., & Zhou, H. (2018). Accelerate GPU Concurrent Kernel Execution by Mitigating Memory Pipeline Stalls. 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). https://doi.org/10.1109/hpca.2018.00027 Lin, Z., Mantor, M., & Zhou, H. (2018). GPU Performance vs. Thread-Level Parallelism: Scalability Analysis and a Novel Way to Improve TLP. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 15(1). https://doi.org/10.1145/3177964 Lin, Z., Nyland, L., & Zhou, H. Y. (2016). Enabling efficient preemption for SIMT architectures with lightweight context switching. SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 898–908. https://doi.org/10.1109/sc.2016.76 Li, C., Yang, Y., Lin, Z., & Zhou, H. Y. (2015). Automatic data placement into GPU on-chip memory resources. 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 23–33. https://doi.org/10.1109/cgo.2015.7054184 GLES: A Practical GPGPU Optimizing Compiler Using Data Sharing and Thread Coarsening. (2015). In Languages and Compilers for Parallel Computing. https://doi.org/10.1007/978-3-319-17473-0_3 Implementation and evaluation of deep neural networks (DNN) on mainstream heterogeneous systems. (2014). Proceedings of 5th Asia-Pacific Workshop on Systems - APSys '14. https://doi.org/10.1145/2637166.2637229