Works (106)

Updated: November 20th, 2023 08:02

2023 journal article

An Intelligent Framework for Oversubscription Management in CPU-GPU Unified Memory

JOURNAL OF GRID COMPUTING, 21(1).

By: X. Long*, X. Gong *, B. Zhang * & H. Zhou n 

co-author countries: China πŸ‡¨πŸ‡³ United States of America πŸ‡ΊπŸ‡Έ
author keywords: Discrete CPU-GPU system; Unified virtual memory; Oversubscription; Deep learning
Sources: Web Of Science, ORCID
Added: March 20, 2023

2023 journal article

Deep learning based data prefetching in CPU-GPU unified virtual memory

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 174, 19–31.

By: X. Long*, X. Gong *, B. Zhang * & H. Zhou n 

co-author countries: China πŸ‡¨πŸ‡³ United States of America πŸ‡ΊπŸ‡Έ
author keywords: Data prefetching; Graphics processing unit; Unified virtual memory; Deep learning; Transformer
Sources: Web Of Science, ORCID
Added: May 1, 2023

2023 article

Plutus: Bandwidth-Efficient Memory Security for GPUs

2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, pp. 543–555.

By: R. Abdullah n, H. Zhou n  & A. Awad n

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
Sources: Web Of Science, ORCID
Added: June 5, 2023

2023 article

SecPB: Architectures for Secure Non-Volatile Memory with Battery-Backed Persist Buffers

2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, pp. 677–690.

By: A. Freij n, H. Zhou n  & Y. Solihin *

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
Sources: Web Of Science, ORCID
Added: June 5, 2023

2022 journal article

A Survey of GPU Multitasking Methods Supported by Hardware Architecture

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 33(6), 1451–1463.

By: C. Zhao*, W. Gao*, F. Nie * & H. Zhou n 

co-author countries: China πŸ‡¨πŸ‡³ United States of America πŸ‡ΊπŸ‡Έ
author keywords: Graphics processing units; Multitasking; Kernel; Hardware; Computer architecture; Registers; Task analysis; GPU multitasking; survey; hardware architecture; temporal multitasking; spatial multitasking; simultaneous multitasking (SMK)
Sources: ORCID, Web Of Science
Added: October 29, 2021

2022 article

Adaptive Security Support for Heterogeneous Memory on GPUs

2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022), pp. 213–228.

By: S. Yuan n, A. Awad n, A. Yudha*, Y. Solihin * & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
author keywords: GPUs; secure memory; heterogeneous memory; encryption; integrity check; security metadata cache
Sources: Web Of Science, ORCID
Added: August 29, 2022

2022 conference paper

Exploiting Quantum Assertions for Error Mitigation and Quantum Program Debugging

2022 IEEE 40TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2022), 124–131.

By: P. Li n, J. Liu n, Y. Li* & H. Zhou n 

co-author countries: China πŸ‡¨πŸ‡³ United States of America πŸ‡ΊπŸ‡Έ

Event: IEEE 40th International Conference on Computer Design (ICCD) at Olympic Valley, CA, USA on October 23-26, 2022

author keywords: quantum computing; error mitigation; debugging; assertion
Sources: Web Of Science, ORCID
Added: March 20, 2023

2022 article

LITE: A Low-Cost Practical Inter-Operable GPU TEE

PROCEEDINGS OF THE 36TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ICS 2022.

By: A. Yudha*, J. Meyer*, S. Yuan n, H. Zhou n  & Y. Solihin*

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
author keywords: GPU TEE; software encryption; memory encryption; GPU enclave
Sources: Web Of Science, ORCID
Added: November 13, 2023

2022 article

Not All SWAPs Have the Same Cost: A Case for Optimization-Aware Qubit Routing

2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022), pp. 709–725.

By: J. Liu n, P. Li n & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
author keywords: quantum computing; compiler optimization; qubit routing
Sources: Web Of Science, ORCID
Added: August 29, 2022

2021 article

Analyzing Secure Memory Architecture for GPUs

2021 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS 2021), pp. 59–69.

By: S. Yuan n, A. Yudha*, Y. Solihin * & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
author keywords: GPUs; security; secure memory; memory encryption; memory integrity; metadata cache
Sources: Web Of Science, ORCID
Added: August 16, 2021

2021 article

PILOT: a Runtime System to Manage Multi-tenant GPU Unified Memory Footprint

2021 IEEE 28TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2021), pp. 442–447.

By: J. Ravi n, T. Nguyen n, H. Zhou n  & M. Becchi n

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
Sources: Web Of Science, ORCID
Added: May 2, 2022

2021 article

Relaxed Peephole Optimization: A Novel Compiler Optimization for Quantum Circuits

CGO '21: PROCEEDINGS OF THE 2021 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO), pp. 301–314.

By: J. Liu n, L. Bello* & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
author keywords: quantum computing; peephole optimization
Sources: Web Of Science, ORCID
Added: July 26, 2021

2021 article

Systematic Approaches for Precise and Approximate Quantum State Runtime Assertion

2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), pp. 179–193.

By: J. Liu n & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
author keywords: quantum computing; runtime assertion
Sources: Web Of Science, ORCID
Added: July 26, 2021

2020 journal article

Exploring Convolution Neural Network for Branch Prediction

IEEE Access, 8, 152008–152016.

By: Y. Mao n, H. Zhou n , X. Gui * & J. Shen  n

co-author countries: China πŸ‡¨πŸ‡³ United States of America πŸ‡ΊπŸ‡Έ
author keywords: History; Neural networks; Machine learning; Convolution; Predictive models; Prediction algorithms; Correlation; Branch prediction; CNN; deep learning; VGG; ResNet
Source: ORCID
Added: August 27, 2020

2020 journal article

Fair and cache blocking aware warp scheduling for concurrent kernel execution on GPU

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 112, 1093–1105.

By: C. Zhao *, W. Gao*, F. Nie *, F. Wang * & H. Zhou n 

co-author countries: China πŸ‡¨πŸ‡³ United States of America πŸ‡ΊπŸ‡Έ
author keywords: GPU; Concurrent kernels; Warp scheduling; Cache blocking; Interference
Sources: Web Of Science, ORCID
Added: September 28, 2020

2020 conference paper

Quantum Circuits for Dynamic Runtime Assertions in Quantum Computation

ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, 1017–1030.

By: J. Liu  n, G. Byrd n  & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ

Contributors: J. Liu  n, G. Byrd n  & H. Zhou n 

Event: Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems

author keywords: Quantum Computing; Runtime Assertion
Sources: Web Of Science, ORCID
Added: May 8, 2020

2020 article

Reliability Modeling of NISQ-Era Quantum Computers

2020 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2020), pp. 94–105.

By: J. Liu n & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
author keywords: NISQ quantum computer; reliability model; neural network
Sources: Web Of Science, ORCID
Added: June 10, 2021

2020 article

Scalable and Fast Lazy Persistency on GPUs

2020 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2020), pp. 252–263.

By: A. Yudha*, K. Kimura *, H. Zhou n  & Y. Solihin *

co-author countries: Japan πŸ‡―πŸ‡΅ United States of America πŸ‡ΊπŸ‡Έ
Sources: Web Of Science, ORCID
Added: June 10, 2021

2019 journal article

Coordinated CTA Combination and Bandwidth Partitioning for GPU Concurrent Kernel Execution

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 16(3).

By: Z. Lin n , H. Dai n, M. Mantor* & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
author keywords: GPGPU; TLP; bandwidth management; concurrent kernel execution
Sources: Web Of Science, ORCID
Added: December 2, 2019

2019 conference paper

Exploring Memory Persistency Models for GPUs

28th International Conference on Parallel Architectures and Compilation Techniques (PACT), 310–322.

By: Z. Lin n , M. Alshboul n, Y. Solihin* & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ

Event: International Conference on Parallel Architectures and Compilation Techniques at Seattle, WA on September 21-25, 2019

Sources: Web Of Science, ORCID
Added: August 10, 2020

2019 conference paper

In-Place Zero-Space Memory Protection for CNN

In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'AlchΓ©-Buc, E. Fox, & R. Garnett (Eds.), Advances in Neural Information Processing Systems (Vol. 32). San Mateo, CA: Morgan Kaufmann Publishers.

By: H. Guan, L. Ning, Z. Lin, X. Shen, H. Zhou  & S. Lim

Ed(s): H. Wallach, H. Larochelle, A. Beygelzimer, F. d'AlchΓ©-Buc, E. Fox & R. Garnett

Source: NC State University Libraries
Added: November 24, 2020

2019 journal article

Quantum Circuits for Dynamic Runtime Assertions in Quantum Computation

IEEE Computer Architecture Letters, 18(2), 111–114.

By: H. Zhou n  & G. Byrd n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ

Contributors: H. Zhou n  & G. Byrd n 

author keywords: Quantum computing; assertions; quantum circuits; debugging; quantum error detection
Sources: Web Of Science, ORCID, Crossref
Added: September 23, 2019

2019 article

Quantum Circuits for Dynamic Runtime Assertions in Quantum Computation

Liu, J., Byrd, G., & Zhou, H. (2019, December 9).

By: J. Liu, G. Byrd & H. Zhou* 

Source: ORCID
Added: December 30, 2019

2019 article

Quantum Circuits for Dynamic Runtime Assertions in Quantum Computation

Liu, J., Byrd, G., & Zhou, H. (2019, December 9).

By: J. Liu, G. Byrd & H. Zhou* 

Source: ORCID
Added: December 30, 2019

2019 article

Scatter-and-Gather Revisited: High-Performance Side-Channel-Resistant AES on GPUs

12TH WORKSHOP ON GENERAL PURPOSE PROCESSING USING GPUS (GPGPU 12), pp. 2–11.

By: Z. Lin n , U. Mathur n & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
Sources: Web Of Science, ORCID
Added: July 22, 2019

2018 conference paper

Accelerate GPU Concurrent Kernel Execution by Mitigating Memory Pipeline Stalls

2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

By: H. Dai n, Z. Lin n , C. Li  n, C. Zhao *, F. Wang *, N. Zheng *, H. Zhou n 

co-author countries: China πŸ‡¨πŸ‡³ United States of America πŸ‡ΊπŸ‡Έ
Sources: Web Of Science, ORCID
Added: September 22, 2019

2018 journal article

Developing Noise-Resistant Three-Dimensional Single Particle Tracking Using Deep Neural Networks

ANALYTICAL CHEMISTRY, 90(18), 10748–10757.

By: Y. Zhong n, C. Li n, H. Zhou n  & G. Wang n

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
MeSH headings : Fluorescent Dyes / chemistry; Imaging, Three-Dimensional; Microscopy, Fluorescence; Neural Networks, Computer; Particle Size; Signal-To-Noise Ratio
Sources: Web Of Science, ORCID
Added: October 16, 2018

2018 journal article

GPU Performance vs. Thread-Level Parallelism: Scalability Analysis and a Novel Way to Improve TLP

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 15(1).

By: Z. Lin n , M. Mantor* & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
author keywords: GPGPU; TLP; context switching; latency hiding
Sources: Web Of Science, ORCID
Added: August 6, 2018

2017 article

Developing Dynamic Profiling and Debugging Support in OpenCL for FPGAs

PROCEEDINGS OF THE 2017 54TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC).

By: A. Verma*, H. Zhou n , S. Booth*, R. King *, J. Coole*, A. Keep*, J. Marshall *, W. Feng *

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
author keywords: OpenCL; FPGA; Debugging; Profiling; Framework; Code Patterns
Sources: Web Of Science, ORCID
Added: August 6, 2018

2017 conference paper

EffiSha: A Software Framework for Enabling Efficient Preemptive Scheduling of GPU

PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 3–16.

By: G. Chen n, Y. Zhao  n, X. Shen  n & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
Sources: Web Of Science, ORCID
Added: November 21, 2020

2017 report

Exploring deep neural networks for branch prediction

[Technical Report]. https://people.engr.ncsu.edu/hzhou/CNN_DBN_zhou_2017.pdf

By: Y. Mao, H. Zhou  & X. Gui

Source: NC State University Libraries
Added: November 21, 2020

2017 journal article

Methylation specific targeting of a chromatin remodeling complex from sponges to humans

SCIENTIFIC REPORTS, 7.

By: J. Cramer n, D. Pohlmann*, F. Gomez*, L. Mark*, B. Kornegay*, C. Hall*, E. Siraliev-Perez *, N. Walavalkar* ...

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
MeSH headings : Amino Acid Sequence; Animals; Chromatin Assembly and Disassembly; DNA / chemistry; DNA / metabolism; DNA Methylation; DNA-Binding Proteins / chemistry; DNA-Binding Proteins / genetics; DNA-Binding Proteins / metabolism; Gene Knockdown Techniques; Humans; Models, Molecular; Nucleic Acid Conformation; Phenotype; Porifera / genetics; Porifera / metabolism; Protein Conformation
Sources: Web Of Science, ORCID
Added: August 6, 2018

2017 conference paper

The Demand for a Sound Baseline in GPU Memory Architecture Research

14th Annual Workshop on Duplicating, Deconstructing and Debunking (WDDD). Presented at the Workshop on Duplicating, Deconstructing and Debunking, Toronto, Canada. https://people.engr.ncsu.edu/hzhou/Hongwen_WDDD2017.pdf

By: H. Dai, C. Li, Z. Lin & H. Zhou 

Event: Workshop on Duplicating, Deconstructing and Debunking at Toronto, Canada on June 25, 2017

Source: NC State University Libraries
Added: November 21, 2020

2016 journal article

A Cross-Platform SpMV Framework on Many-Core Architectures

ACM Transactions on Architecture and Code Optimization, 13(4), 1–25.

By: Y. Zhang*, S. Li *, S. Yan* & H. Zhou n 

co-author countries: China πŸ‡¨πŸ‡³ United States of America πŸ‡ΊπŸ‡Έ
author keywords: SpMV; segmented scan; BCCOO; OpenCL; CUDA; GPU; Intel MIC; parallel algorithms
Sources: Crossref, ORCID
Added: January 28, 2020

2016 article

A Model-Driven Approach to Warp/Thread-Block Level GPU Cache Bypassing

2016 ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC).

By: H. Dai n, C. Li n, H. Zhou n , S. Gupta *, C. Kartsaklis* & M. Mantor*

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
Sources: Web Of Science, ORCID
Added: August 6, 2018

2016 conference paper

Enabling efficient preemption for SIMT architectures with lightweight context switching

SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 898–908.

By: Z. Lin n , L. Nyland* & Huiyang

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2016 conference paper

Opencl-based erasure coding on heterogeneous architectures

Ieee international conference on application-specific systems, 7, 33–40.

By: G. Chen n, Huiyang, X. Shen n , J. Gahm*, N. Venkat *, S. Booth*, J. Marshall *

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2016 conference paper

Optimizing memory efficiency for deep convolutional neural networks on GPUs

SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 633–644.

By: C. Li n, Y. Yang *, M. Feng *, S. Chakradhar* & Huiyang

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2016 conference paper

Selective GPU Cache Bypassing for Un-Coalesced Loads

In X. Liao (Ed.), 22nd IEEE International Conference on Parallel and Distributed Systems : ICPADS 2016 : proceedings : 13-16 December 2016, Wuhan, Hubei, China.

co-author countries: China πŸ‡¨πŸ‡³

Ed(s): X. Liao

Event: 22nd IEEE International Conference on Parallel and Distributed Systems at Wuhan, Hubei, China on December 13-16, 2016

Sources: NC State University Libraries, ORCID
Added: January 30, 2021

2016 conference paper

Tuning stencil codes in opencl for fpgas

Proceedings of the 34th ieee international conference on computer design (iccd), 249–256.

By: Q. Jia  n & Huiyang

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2015 conference paper

An Optimized AMPM-based Prefetcher Coupled with Configurable Cache Line Sizing

JILP Workshop on Computer Architecture Competitions (JWAC): 2nd Data Prefetching Championship (DPC2).

By: Q. Jia, M. Padia, K. Amboju & H. Zhou 

Source: NC State University Libraries
Added: January 30, 2021

2015 conference paper

Analyzing graphics processor unit (GPU) instruction set architectures

Ieee international symposium on performance analysis of systems and, 155–156.

By: K. Mayank n, H. Dai n, J. Wei * & Huiyang

co-author countries: China πŸ‡¨πŸ‡³ United States of America πŸ‡ΊπŸ‡Έ
Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2015 conference paper

Automatic data placement into GPU on-chip memory resources

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 23–33.

By: C. Li n, Y. Yang *, Z. Lin n  & Huiyang

co-author countries: Japan πŸ‡―πŸ‡΅ United States of America πŸ‡ΊπŸ‡Έ
Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2015 journal article

CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications

JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 30(1), 3–19.

By: Y. Yang *, C. Li n & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
author keywords: GPGPU; nested parallelism; compiler; local memory
Sources: Web Of Science, ORCID
Added: August 6, 2018

2015 conference paper

Locality-Driven Dynamic GPU Cache Bypassing

ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing, 61–77.

By: C. Li  n, S. Song *, H. Dai n, A. Sidelnik*, S. Hari* & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ

Event: 29th International conference on supercomputing at Newport Beach/Irvine, CA on June 8-11, 2015

author keywords: GPU architecture Optimization; Locality; Cache Bypassing
Sources: Web Of Science, ORCID
Added: January 30, 2021

2015 conference paper

Revisiting ILP Designs for Throughput-Oriented GPGPU Architecture

Proceedings of the 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 121–130.

By: P. Xiang, Y. Yang *, M. Mantor, N. Rubin* & H. Zhou* 

co-author countries: United Kingdom of Great Britain and Northern Ireland πŸ‡¬πŸ‡§ Japan πŸ‡―πŸ‡΅

Event: 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing at Shenzhen, China on May 4-7, 2015

author keywords: GPGPU; Heterogeneous; ILP; Energy
Sources: Web Of Science, ORCID
Added: February 6, 2021

2015 article

Spatial Locality-Aware Cache Partitioning for Effective Cache Sharing

2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), pp. 150–159.

By: S. Gupta * & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
author keywords: shared last level cache; cache partitioning; spatial locality; cache management; high bandwidth memory
Sources: Web Of Science, ORCID
Added: August 6, 2018

2014 conference paper

A Case for a Flexible Scalar Unit in SIMT Architecture

Proceedings of 2014 IEEE 28th International Parallel and Distributed Processing Symposium. Presented at the 978-1-4799-3799-8, Phoenix, AZ.

By: Y. Yang *, P. Xiang  n, M. Mantor*, N. Rubin*, L. Hsu*, Q. Dong *, H. Zhou n 

co-author countries: China πŸ‡¨πŸ‡³ United States of America πŸ‡ΊπŸ‡Έ

Event: 978-1-4799-3799-8 at Phoenix, AZ on May 19-23, 2014

Sources: NC State University Libraries, ORCID
Added: February 6, 2021

2014 chapter

A Highly Efficient FFT Using Shared-Memory Multiplexing

In Numerical Computations with GPUs (pp. 363–377).

By: Y. Yang  n & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
Sources: Crossref, ORCID
Added: January 28, 2020

2014 journal article

CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications

ACM SIGPLAN NOTICES, 49(8), 93–105.

By: Y. Yang & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
author keywords: Performance; Design; Experimentation; Languages; GPGPU; nested parallelism; compiler; local memory
Sources: Web Of Science, ORCID
Added: August 6, 2018

2014 conference paper

Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs

Ieee international symposium on performance analysis of systems and, 231–241.

By: C. Li, Y. Yang, H. Dai, S. Yan, F. Mueller & H. Zhou 

Source: NC State University Libraries
Added: August 6, 2018

2014 conference paper

Warp-level divergence in GPUs: Characterization, impact, and mitigation

International symposium on high-performance computer, 284–295.

By: P. Xiang n, Y. Yang & Huiyang

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2014 conference paper

yaSpM: Yet Another SpMV Framework on GPUs

Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 49(8), 107–118.

By: S. Yan*, C. Li n, Y. Zhang* & H. Zhou n 

co-author countries: China πŸ‡¨πŸ‡³ United States of America πŸ‡ΊπŸ‡Έ

Event: 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming at Orlando, FL

author keywords: SpMV; Segmented Scan; BCCOO; OpenCL; CUDA; GPU; Parallel algorithms
Sources: Web Of Science, ORCID
Added: August 6, 2018

2013 article

Adaptive Cache Bypassing for Inclusive Last Level Caches

IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), pp. 1243–1253.

By: S. Gupta n, H. Gao * & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
author keywords: Last level cache; cache bypassing; cache replacement policy; inclusion property
Sources: Web Of Science, ORCID
Added: August 6, 2018

2013 journal article

Analyzing locality of memory references in GPU architectures

MSPC '13: Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, 6.

By: S. Gupta  n, P. Xiang  n & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ

Event: ACM SIGPLAN Workshop on Memory Systems Performance and Correctness at Seattle, WA

Sources: NC State University Libraries, ORCID
Added: February 6, 2021

2013 journal article

Architecting against Software Cache-Based Side-Channel Attacks

IEEE TRANSACTIONS ON COMPUTERS, 62(7), 1276–1288.

By: J. Kong*, O. Aciicmez*, J. Seifert* & H. Zhou n 

co-author countries: Germany πŸ‡©πŸ‡ͺ United States of America πŸ‡ΊπŸ‡Έ
author keywords: Cache memories; private/public key cryptosystems; side-channel attacks; architectural support for computer security
Sources: Web Of Science, ORCID
Added: August 6, 2018

2013 conference paper

Exploiting Uniform Vector Instructions for GPGPU Performance, Energy Efficiency, and Opportunistic Reliability Enhancement

Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, 433–442.

By: P. Xiang  n, Y. Yang *, M. Mantor*, N. Rubin*, L. Hsu* & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ

Event: 27th International ACM Conference on International Conference on Supercomputing at Eugene, Oregon

Sources: NC State University Libraries, ORCID
Added: February 6, 2021

2013 journal article

Locality principle revisited: A probability-based quantitative approach

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 73(7), 1011–1027.

By: S. Gupta n, P. Xiang n, Y. Yang n & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
author keywords: Locality of references; Probability; Memory hierarchy; Last level cache; Cache replacement policy; Data prefetching; Locality optimizations
Sources: Web Of Science, ORCID
Added: August 6, 2018

2013 journal article

The Implementation of a High Performance GPGPU Compiler

INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 41(6), 768–781.

By: Y. Yang n & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
author keywords: GPU; Compiler; Optimization; Vectorization; OpenCL
Sources: Web Of Science, ORCID
Added: August 6, 2018

2012 journal article

A Unified Optimizing Compiler Framework for Different GPGPU Architectures

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 9(2).

By: Y. Yang n, P. Xiang  n, J. Kong*, M. Mantor* & H. Zhou n 

co-author countries: Canada πŸ‡¨πŸ‡¦ United States of America πŸ‡ΊπŸ‡Έ
author keywords: Performance; Experimentation; Languages; GPGPU; OpenCL; CUDA; CUBLAS; GPU Computing
Sources: Web Of Science, ORCID
Added: August 6, 2018

2012 conference paper

CPU-assisted GPGPU on fused CPU-GPU architectures

International symposium on high-performance computer, 103–114.

By: Y. Yang, P. Xiang, M. Mantor & H. Zhou 

Source: NC State University Libraries
Added: August 6, 2018

2012 conference paper

Fixing Performance Bugs: An Empirical Study of Open-Source GPGPU Programs

2012 41st International Conference on Parallel Processing. Presented at the 2012 41st International Conference on Parallel Processing (ICPP).

By: Y. Yang  n, P. Xiang  n, M. Mantor* & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ

Event: 2012 41st International Conference on Parallel Processing (ICPP)

Sources: Crossref, ORCID
Added: January 28, 2020

2012 conference paper

Locality principle revisited: A probability-based quantitative approach

2012 ieee 26th international parallel and distributed processing symposium (ipdps), 995–1009.

By: S. Gupta n, P. Xiang n, Y. Yang n & Huiyang

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2012 conference paper

Shared Memory Multiplexing: A Novel Way to Improve GPGPU Throughput

Proceedings of the 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT). Presented at the 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT), Minneapolis, MN, USA.

By: Y. Yang, P. Xiang, M. Mantor, N. Rubin & H. Zhou 

Event: 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT) at Minneapolis, MN, USA on September 19-23, 2012

Source: NC State University Libraries
Added: February 7, 2021

2011 journal article

Combining Local and Global History for High Performance Data Prefetching

Journal of Instruction-Level Parallelism (JILP), 13, 1–14.

By: M. Dimitrov & H. Zhou 

Event: Data Prefetching Championship (DPC-1) held with 15th International Symposium on High Performance Computer Architecture (HPCA-15) at Raleigh, NC on February 14-18, 2009

Source: NC State University Libraries
Added: August 6, 2018

2011 conference paper

Developing a High Performance GPGPU Compiler using Cetus

Proceedings of the Cetus Users and Compiler Infrastructure Workshop, International Conference on Parallel Architectures and Compilation Techniques (PACT’11). Presented at the International Conference on Parallel Architectures and Compilation Techniques (PACT’11).

By: Y. Yang & H. Zhou 

Event: International Conference on Parallel Architectures and Compilation Techniques (PACT’11)

Source: NC State University Libraries
Added: February 7, 2021

2011 journal article

Exploring Correlation for Indirect Branch Prediction

2nd JILP Workshop on Computer Architecture Competitions (JWAC-2): Championship Branch Prediction. Presented at the 2nd JILP Workshop on Computer Architecture Competitions (JWAC-2): Championship Branch Prediction, held with ISCA-38.

By: N. Bhansali, C. Panirwla & H. Zhou 

Event: 2nd JILP Workshop on Computer Architecture Competitions (JWAC-2): Championship Branch Prediction, held with ISCA-38

Source: NC State University Libraries
Added: February 7, 2021

2011 conference paper

Time-Ordered Event Traces: A New Debugging Primitive for Concurrency Bugs

2011 IEEE International Parallel & Distributed Processing Symposium. Presented at the Distributed Processing Symposium (IPDPS).

By: M. Dimitrov* & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ

Event: Distributed Processing Symposium (IPDPS)

Sources: Crossref, ORCID
Added: January 28, 2020

2010 article

A GPGPU Compiler for Memory Optimization and Parallelism Management

Yang, Y., Xiang, P., Kong, J., & Zhou, H. (2010, June). ACM SIGPLAN NOTICES, Vol. 45, pp. 86–97.

By: Y. Yang n, P. Xiang*, J. Kong* & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
author keywords: Performance; Experimentation; Languages; GPGPU; Compiler
Sources: Web Of Science, ORCID
Added: August 6, 2018

2010 conference paper

Accelerating MATLAB Image Processing Toolbox Functions on GPUs

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, 75–85.

By: J. Kong*, M. Dimitrov*, Y. Yang  n, J. Liyanage*, L. Cao, J. Staples *, M. Mantor, H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ

Event: 3rd Workshop on General-Purpose Computation on Graphics Processing Units at Pittsburgh, Pennsylvania, USA

Sources: NC State University Libraries, ORCID
Added: February 7, 2021

2010 article

An Optimizing Compiler for GPGPU Programs with Input-Data Sharing

Yang, Y., Xiang, P., Kong, J., & Zhou, H. (2010, May). ACM SIGPLAN NOTICES, Vol. 45, pp. 343–344.

By: Y. Yang n, P. Xiang *, J. Kong* & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
author keywords: Performance; Experimentation; Languages; GPGPU; Compiler
Sources: Web Of Science, ORCID
Added: August 6, 2018

2010 article

An Optimizing Compiler for GPGPU Programs with Input-Data Sharing

PPOPP 2010: PROCEEDINGS OF THE 2010 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, pp. 343–344.

By: Y. Yang n, P. Xiang*, J. Kong* & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
author keywords: GPGPU; Compiler
Sources: Web Of Science, ORCID
Added: August 6, 2018

2010 conference paper

Improving privacy and lifetime of PCM-based main memory

2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN). Presented at the Networks (DSN).

By: J. Kong* & H. Zhou n 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ

Event: Networks (DSN)

Sources: Crossref, ORCID
Added: January 28, 2020

2009 conference paper

Anomaly-based bug prediction, isolation, and validation

Proceeding of the 14th international conference on Architectural support for programming languages and operating systems - ASPLOS '09. Presented at the Proceeding of the 14th international conference.

By: M. Dimitrov* & H. Zhou* 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ

Event: Proceeding of the 14th international conference

Sources: Crossref, ORCID
Added: January 28, 2020

2009 conference paper

Hardware-software integrated approaches to defend against software cache-based side channel attacks

2009 IEEE 15th International Symposium on High Performance Computer Architecture. Presented at the HPCA - 15 2009. IEEE 15th International Symposium on High Performance Computer Architecture.

By: J. Kong*, O. Aciicmez*, J. Seifert * & H. Zhou* 

co-author countries: Germany πŸ‡©πŸ‡ͺ Russian Federation πŸ‡·πŸ‡Ί United States of America πŸ‡ΊπŸ‡Έ

Event: HPCA - 15 2009. IEEE 15th International Symposium on High Performance Computer Architecture

Sources: Crossref, ORCID
Added: January 28, 2020

2009 conference paper

Understanding software approaches for GPGPU reliability

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units - GPGPU-2. Presented at the 2nd Workshop.

By: M. Dimitrov*, M. Mantor & H. Zhou* 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ

Event: 2nd Workshop

Sources: Crossref, ORCID
Added: January 28, 2020

2008 conference paper

Address-branch correlation: A novel locality for long-latency hard-to-predict branches

2008 IEEE 14th International Symposium on High Performance Computer Architecture. Presented at the 2008 IEEE 14th International Symposium on High Performance Computer Architecture (HPCA).

By: H. Gao *, Y. Ma *, M. Dimitrov* & H. Zhou* 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ

Event: 2008 IEEE 14th International Symposium on High Performance Computer Architecture (HPCA)

Sources: Crossref, ORCID
Added: January 28, 2020

2008 conference paper

Deconstructing new cache designs for thwarting software cache-based side channel attacks

Proceedings of the 2nd ACM workshop on Computer security architectures - CSAW '08. Presented at the the 2nd ACM workshop.

By: J. Kong*, O. Aciicmez*, J. Seifert * & H. Zhou* 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ

Event: the 2nd ACM workshop

Sources: Crossref, ORCID
Added: January 28, 2020

2007 journal article

Optimizing dual-core execution for power efficiency and transient-fault recovery

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 18(8), 1080–1093.

By: Y. Ma *, H. Gao *, M. Dimitrov* & H. Zhou* 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
author keywords: multiple data stream architectures; fault tolerance; low-power design
Sources: Web Of Science, ORCID
Added: August 6, 2018

2007 journal article

PMPM: Prediction by combining multiple partial matches

Journal of Instruction-Level Parallelism, 9, 1–18.

By: H. Gao & H. Zhou 

Source: NC State University Libraries
Added: August 6, 2018

2007 conference paper

Unified Architectural Support for Soft-Error Protection or Software Bug Detection

16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007). Presented at the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

By: M. Dimitrov* & H. Zhou* 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ

Event: 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007)

Sources: Crossref, ORCID
Added: January 28, 2020

2006 conference paper

Efficient Transient-Fault Tolerance for Multithreaded Processors Using Dual-Thread Execution

2006 International Conference on Computer Design. Presented at the 2006 International Conference on Computer Design.

By: Y. Ma * & H. Zhou* 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ

Event: 2006 International Conference on Computer Design

Sources: Crossref, ORCID
Added: January 28, 2020

2006 conference paper

Improving software security via runtime instruction-level taint checking

Proceedings of the 1st workshop on Architectural and system support for improving software dependability - ASID '06. Presented at the the 1st workshop.

By: J. Kong*, C. Zou * & H. Zhou* 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ

Event: the 1st workshop

Sources: Crossref, ORCID
Added: January 28, 2020

2006 conference paper

Locality-based Information Redundancy for Processor Reliability

2nd Workshop on Architectural Reliability (WAR-2) held in conjunction with 39th International Symposium on Microarchitecture (MICRO-39), 29–36.

By: M. Dimitrov & H. Zhou 

Source: NC State University Libraries
Added: February 8, 2021

2006 conference paper

PMPM: Prediction by Combining Multiple Partial Matches

2nd Championship Branch Prediction (CBP-2) held with the 39th International Symposium on Microarchitecture (MICRO-39), 19–24.

By: H. Gao & H. Zhou 

Source: NC State University Libraries
Added: February 8, 2021

2006 journal article

Using index functions to reduce conflict aliasing in branch prediction tables

IEEE Transactions on Computers, 55(8), 1057–1061.

By: G. Ma Y. & H. Zhou 

Source: NC State University Libraries
Added: August 6, 2018

2005 journal article

A case for fault tolerance and performance enhancement using chip multi-processors

IEEE Computer Architecture Letters, 4, 1–4.

By: H. Zhou 

Source: NC State University Libraries
Added: August 6, 2018

2005 journal article

Adaptive information processing: an effective way to improve perceptron branch predictors

Journal of Instruction-Level Parallelism, 7, 1–10.

By: H. Gao & H. Zhou 

Source: NC State University Libraries
Added: August 6, 2018

2005 conference paper

Code size efficiency in global scheduling for ILP processors

Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures. Presented at the Sixth Annual Workshop on Interaction between Compilers and Computer Architectures.

By: H. Zhou n  & T. Conte n

co-author countries: United States of America πŸ‡ΊπŸ‡Έ

Event: Sixth Annual Workshop on Interaction between Compilers and Computer Architectures

Sources: Crossref, ORCID
Added: January 28, 2020

2005 conference paper

Detecting global stride locality in value streams

30th Annual International Symposium on Computer Architecture, 2003. Proceedings. Presented at the ISCA 2003: 30th International Symposium on Computer Architecture.

By: H. Zhou* , J. Flanagan & T. Conte

Event: ISCA 2003: 30th International Symposium on Computer Architecture

Sources: Crossref, ORCID
Added: January 28, 2020

2005 conference paper

Dual-core execution: building a highly scalable single-thread instruction window

14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05). Presented at the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

By: H. Zhou* 

co-author countries: United States of America πŸ‡ΊπŸ‡Έ

Event: 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05)

Sources: Crossref, ORCID
Added: January 28, 2020

2005 journal article

Enhancing memory-level parallelism via recovery-free value prediction

IEEE Transactions on Computers, 54, 897–912.

By: Huiyang & T. Conte n

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2004 conference paper

Adaptive Information Processing: An Effective Way to Improve Perceptron Branch Predictors

1st Championship Branch Prediction (CBP-1) held with the 37th International Symposium on Microarchitecture (MICRO-37).

By: H. Gao & H. Zhou 

Source: NC State University Libraries
Added: February 8, 2021

2003 journal article

Adaptive mode control: A static-power-efficient cache design

ACM Transactions on Embedded Computing Systems, 2(3), 347–372.

By: Huiyang, M. Toburen n, E. Rotenberg n & T. Conte n

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2003 report

Code size aware compilation for real-time applications

[Technical Report]. Computer Science Department, University of Central Florida.

By: H. Zhou 

Source: NC State University Libraries
Added: February 8, 2021

2003 conference paper

Enhancing Memory Level Parallelism via Recovery-Free Value Prediction

The 2003 International Conference on Supercomputing (ICS'03), 326–335.

By: H. Zhou  & T. Conte

Source: NC State University Libraries
Added: February 8, 2021

2003 report

Performance modeling of memory latency hiding techniques

[Technical Report,]. Raleigh, NC: Department of Electrical and Computer Engineering, North Carolina State University.

By: H. Zhou  & T. Conte

Source: NC State University Libraries
Added: February 20, 2021

2003 chapter

Tree Traversal Scheduling: A Global Instruction Scheduling Technique for VLIW/EPIC Processors

In Languages and Compilers for Parallel Computing (Vol. 2624, pp. 223–238).

By: H. Zhou n , M. Jennings  n & T. Conte n

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
Sources: Web Of Science, ORCID, Crossref
Added: August 6, 2018

2002 report

Using Performance Bounds to Guide Pre-scheduling Code Optimizations

[Technical Report,]. Raleigh, NC: Department of Electrical and Computer Engineering, North Carolina State University.

By: H. Zhou  & T. Conte

Source: NC State University Libraries
Added: February 20, 2021

2001 report

A Treegion-based Unified Approach to Speculation and Predication in Global Instruction Scheduling

[Technical Report,]. Raleigh, NC: Department of Electrical and Computer Engineering, North Carolina State University.

By: M. Jennings, H. Zhou  & T. Conte

Source: NC State University Libraries
Added: February 20, 2021

2001 report

A study of value speculative execution and mispeculation recovery in superscalar microprocessors

[Technical Report,]. Raleigh, NC: Department of Electrical and Computer Engineering, North Carolina State University.

By: H. Zhou , C. Fu, E. Rotenberg & T. Conte

Source: NC State University Libraries
Added: February 20, 2021

2001 conference paper

Adaptive mode control: A static-power-efficient cache design

2001 International Conference on Parallel Architectures and Compilation Techniques: Proceedings: 8-12 September, 2001, Barcelona, Catalunya, Spain, 61–70.

By: Huiyang, M. Toburen n, E. Rotenberg n & T. Conte n

co-author countries: United States of America πŸ‡ΊπŸ‡Έ
Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2000 report

Adaptive Mode Control: A Low-Leakage Power-Efficient Cache Design

[Technical Report]. Raleigh, NC: Department of Electrical and Computer Engineering, North Carolina State University.

By: H. Zhou , M. Toburen, E. Rotenberg & T. Conte

Source: NC State University Libraries
Added: February 20, 2021

2000 journal article

Automatic IC orientation checks

Machine Vision and Applications, 12(3), 107–112.

By: A. Kassim*, Huiyang & S. Raganath

co-author countries: Singapore πŸ‡ΈπŸ‡¬ United States of America πŸ‡ΊπŸ‡Έ
Sources: NC State University Libraries, ORCID
Added: August 6, 2018

1998 journal article

A fast algorithm for detecting die extrusion defects in IC packages

MACHINE VISION AND APPLICATIONS, 11(1), 37–41.

By: H. Zhou* , A. Kassim & S. Ranganath

co-author countries: Singapore πŸ‡ΈπŸ‡¬
author keywords: IC package inspection; die extrusion defects; linear feature extraction; feature enhancement
Sources: Web Of Science, ORCID
Added: August 6, 2018

1996 journal article

Test sequencing and diagnosis in electronic system with decision table

MICROELECTRONICS AND RELIABILITY, 36(9), 1167–1175.

By: H. Zhou* , L. Qu * & A. Li *

co-author countries: China πŸ‡¨πŸ‡³
Sources: Web Of Science, ORCID
Added: August 6, 2018