Works (9)

Updated: July 5th, 2023 15:34

2019 journal article

Coordinated CTA Combination and Bandwidth Partitioning for GPU Concurrent Kernel Execution

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 16(3).

By: Z. Lin n, H. Dai n, M. Mantor* & H. Zhou n

author keywords: GPGPU; TLP; bandwidth management; concurrent kernel execution
Sources: Web Of Science, ORCID
Added: December 2, 2019

2019 conference paper

Exploring Memory Persistency Models for GPUs

28th International Conference on Parallel Architectures and Compilation Techniques (PACT), 310–322.

By: Z. Lin n, M. Alshboul n, Y. Solihin* & H. Zhou n

Event: International Conference on Parallel Architectures and Compilation Techniques at Seattle, WA on September 21-25, 2019

Sources: Web Of Science, ORCID
Added: August 10, 2020

2019 article

Scatter-and-Gather Revisited: High-Performance Side-Channel-Resistant AES on GPUs

12TH WORKSHOP ON GENERAL PURPOSE PROCESSING USING GPUS (GPGPU 12), pp. 2–11.

By: Z. Lin n, U. Mathur n & H. Zhou n

Sources: Web Of Science, ORCID
Added: July 22, 2019

2018 conference paper

Accelerate GPU Concurrent Kernel Execution by Mitigating Memory Pipeline Stalls

2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

By: H. Dai n, Z. Lin n, C. Li n, C. Zhao*, F. Wang*, N. Zheng*, H. Zhou n

Sources: ORCID, Web Of Science
Added: September 22, 2019

2018 journal article

GPU Performance vs. Thread-Level Parallelism: Scalability Analysis and a Novel Way to Improve TLP

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 15(1).

By: Z. Lin n, M. Mantor* & H. Zhou n

author keywords: GPGPU; TLP; context switching; latency hiding
Sources: Web Of Science, ORCID
Added: August 6, 2018

2016 conference paper

Enabling efficient preemption for SIMT architectures with lightweight context switching

SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 898–908.

By: Z. Lin n, L. Nyland* & H. Zhou n

Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2015 conference paper

Automatic data placement into GPU on-chip memory resources

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 23–33.

By: C. Li n, Y. Yang*, Z. Lin n & H. Zhou n

Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2015 chapter

GLES: A Practical GPGPU Optimizing Compiler Using Data Sharing and Thread Coarsening

In Languages and Compilers for Parallel Computing.

author keywords: GPGPU; Optimization; Compiler
Source: ORCID
Added: September 22, 2019

2014 conference paper

Implementation and evaluation of deep neural networks (DNN) on mainstream heterogeneous systems

Proceedings of 5th Asia-Pacific Workshop on Systems - APSys '14.

Source: ORCID
Added: September 22, 2019