Works (103)

Updated: May 4th, 2023 05:02

2023 journal article

An Intelligent Framework for Oversubscription Management in CPU-GPU Unified Memory

JOURNAL OF GRID COMPUTING, 21(1).

By: X. Long, X. Gong, B. Zhang & H. Zhou

author keywords: Discrete CPU-GPU system; Unified virtual memory; Oversubscription; Deep learning
Sources: Web Of Science, ORCID
Added: March 20, 2023

2023 journal article

Deep learning based data prefetching in CPU-GPU unified virtual memory

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 174, 19–31.

By: X. Long, X. Gong, B. Zhang & H. Zhou

author keywords: Data prefetching; Graphics processing unit; Unified virtual memory; Deep learning; Transformer
Sources: Web Of Science, ORCID
Added: May 1, 2023

2022 journal article

A Survey of GPU Multitasking Methods Supported by Hardware Architecture

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 33(6), 1451–1463.

By: C. Zhao, W. Gao*, F. Nie* & H. Zhou

author keywords: Graphics processing units; Multitasking; Kernel; Hardware; Computer architecture; Registers; Task analysis; GPU multitasking; survey; hardware architecture; temporal multitasking; spatial multitasking; simultaneous multitasking (SMK)
Sources: ORCID, Web Of Science
Added: October 29, 2021

2022 article

Adaptive Security Support for Heterogeneous Memory on GPUs

2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022), pp. 213–228.

By: S. Yuan, A. Awad, A. Yudha*, Y. Solihin* & H. Zhou

author keywords: GPUs; secure memory; heterogeneous memory; encryption; integrity check; security metadata cache
Sources: Web Of Science, ORCID
Added: August 29, 2022

2022 conference paper

Exploiting Quantum Assertions for Error Mitigation and Quantum Program Debugging

2022 IEEE 40TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2022), 124–131.

By: P. Li, J. Liu, Y. Li* & H. Zhou

Event: IEEE 40th International Conference on Computer Design (ICCD) at Olympic Valley, CA, USA on October 23-26, 2022

author keywords: quantum computing; error mitigation; debugging; assertion
Sources: Web Of Science, ORCID
Added: March 20, 2023

2022 article

Not All SWAPs Have the Same Cost: A Case for Optimization-Aware Qubit Routing

2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022), pp. 709–725.

By: J. Liu, P. Li & H. Zhou

author keywords: quantum computing; compiler optimization; qubit routing
Sources: Web Of Science, ORCID
Added: August 29, 2022

2021 article

Analyzing Secure Memory Architecture for GPUs

2021 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS 2021), pp. 59–69.

By: S. Yuan, A. Yudha*, Y. Solihin* & H. Zhou

author keywords: GPUs; security; secure memory; memory encryption; memory integrity; metadata cache
Sources: Web Of Science, ORCID
Added: August 16, 2021

2021 article

PILOT: a Runtime System to Manage Multi-tenant GPU Unified Memory Footprint

2021 IEEE 28TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2021), pp. 442–447.

By: J. Ravi, T. Nguyen, H. Zhou & M. Becchi

Sources: Web Of Science, ORCID
Added: May 2, 2022

2021 article

Relaxed Peephole Optimization: A Novel Compiler Optimization for Quantum Circuits

CGO '21: PROCEEDINGS OF THE 2021 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO), pp. 301–314.

By: J. Liu, L. Bello* & H. Zhou

author keywords: quantum computing; peephole optimization
Sources: Web Of Science, ORCID
Added: July 26, 2021

2021 article

Systematic Approaches for Precise and Approximate Quantum State Runtime Assertion

2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), pp. 179–193.

By: J. Liu & H. Zhou

author keywords: quantum computing; runtime assertion
Sources: Web Of Science, ORCID
Added: July 26, 2021

2020 journal article

Exploring Convolution Neural Network for Branch Prediction

IEEE Access, 8, 152008–152016.

By: Y. Mao n, H. Zhou, X. Gui* & J. Shen

author keywords: History; Neural networks; Machine learning; Convolution; Predictive models; Prediction algorithms; Correlation; Branch prediction; CNN; deep learning; VGG; ResNet
Source: ORCID
Added: August 27, 2020

2020 journal article

Fair and cache blocking aware warp scheduling for concurrent kernel execution on GPU

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 112, 1093–1105.

By: C. Zhao*, W. Gao*, F. Nie*, F. Wang* & H. Zhou

author keywords: GPU; Concurrent kernels; Warp scheduling; Cache blocking; Interference
Sources: Web Of Science, ORCID
Added: September 28, 2020

2020 conference paper

Quantum Circuits for Dynamic Runtime Assertions in Quantum Computation

ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, 1017–1030.

By: J. Liu n, G. Byrd & H. Zhou

Event: Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems

author keywords: Quantum Computing; Runtime Assertion
Sources: Web Of Science, ORCID
Added: May 8, 2020

2020 article

Reliability Modeling of NISQ-Era Quantum Computers

2020 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2020), pp. 94–105.

By: J. Liu & H. Zhou

author keywords: NISQ quantum computer; reliability model; neural network
Sources: Web Of Science, ORCID
Added: June 10, 2021

2020 article

Scalable and Fast Lazy Persistency on GPUs

2020 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2020), pp. 252–263.

By: A. Yudha*, K. Kimura*, H. Zhou & Y. Solihin*

Sources: Web Of Science, ORCID
Added: June 10, 2021

2019 journal article

Coordinated CTA Combination and Bandwidth Partitioning for GPU Concurrent Kernel Execution

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 16(3).

By: Z. Lin, H. Dai, M. Mantor* & H. Zhou

author keywords: GPGPU; TLP; bandwidth management; concurrent kernel execution
Sources: Web Of Science, ORCID
Added: December 2, 2019

2019 conference paper

Exploring Memory Persistency Models for GPUs

28th International Conference on Parallel Architectures and Compilation Techniques (PACT), 310–322.

By: Z. Lin, M. Alshboul, Y. Solihin & H. Zhou

Event: International Conference on Parallel Architectures and Compilation Techniques at Seattle, WA on September 21-25, 2019

Sources: Web Of Science, ORCID
Added: August 10, 2020

2019 conference paper

In-Place Zero-Space Memory Protection for CNN

In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, & R. Garnett (Eds.), Advances in Neural Information Processing Systems (Vol. 32). San Mateo, CA: Morgan Kaufmann Publishers.

By: H. Guan, L. Ning, Z. Lin, X. Shen, H. Zhou & S. Lim

Ed(s): H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox & R. Garnett

Source: NC State University Libraries
Added: November 24, 2020

2019 journal article

Quantum Circuits for Dynamic Runtime Assertions in Quantum Computation

IEEE Computer Architecture Letters, 18(2), 111–114.

By: H. Zhou & G. Byrd

author keywords: Quantum computing; assertions; quantum circuits; debugging; quantum error detection
Sources: Web Of Science, ORCID, Crossref
Added: September 23, 2019

2019 article

Quantum Circuits for Dynamic Runtime Assertions in Quantum Computation

Liu, J., Byrd, G., & Zhou, H. (2019, December 9).

By: J. Liu, G. Byrd & H. Zhou

Source: ORCID
Added: December 30, 2019

2019 article

Quantum Circuits for Dynamic Runtime Assertions in Quantum Computation

Liu, J., Byrd, G., & Zhou, H. (2019, December 9).

By: J. Liu, G. Byrd & H. Zhou

Source: ORCID
Added: December 30, 2019

2019 article

Scatter-and-Gather Revisited: High-Performance Side-Channel-Resistant AES on GPUs

12TH WORKSHOP ON GENERAL PURPOSE PROCESSING USING GPUS (GPGPU 12), pp. 2–11.

By: Z. Lin, U. Mathur n & H. Zhou

Sources: Web Of Science, ORCID
Added: July 22, 2019

2018 conference paper

Accelerate GPU Concurrent Kernel Execution by Mitigating Memory Pipeline Stalls

2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

By: H. Dai n, Z. Lin, C. Li n, C. Zhao*, F. Wang*, N. Zheng*, H. Zhou

Sources: Web Of Science, ORCID
Added: September 22, 2019

2018 journal article

Developing Noise-Resistant Three-Dimensional Single Particle Tracking Using Deep Neural Networks

ANALYTICAL CHEMISTRY, 90(18), 10748–10757.

By: Y. Zhong, C. Li, H. Zhou & G. Wang

Sources: Web Of Science, ORCID
Added: October 16, 2018

2018 journal article

GPU Performance vs. Thread-Level Parallelism: Scalability Analysis and a Novel Way to Improve TLP

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 15(1).

By: Z. Lin, M. Mantor* & H. Zhou

author keywords: GPGPU; TLP; context switching; latency hiding
Sources: Web Of Science, ORCID
Added: August 6, 2018

2017 article

Developing Dynamic Profiling and Debugging Support in OpenCL for FPGAs

PROCEEDINGS OF THE 2017 54TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC).

By: A. Verma*, H. Zhou, S. Booth*, R. King*, J. Coole*, A. Keep*, J. Marshall*, W. Feng*

author keywords: OpenCL; FPGA; Debugging; Profiling; Framework; Code Patterns
Sources: Web Of Science, ORCID
Added: August 6, 2018

2017 conference paper

EffiSha: A Software Framework for Enabling Efficient Preemptive Scheduling of GPU

PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 3–16.

By: G. Chen n, Y. Zhao n, X. Shen n & H. Zhou

Sources: Web Of Science, ORCID
Added: November 21, 2020

2017 report

Exploring deep neural networks for branch prediction

[Technical Report]. https://people.engr.ncsu.edu/hzhou/CNN_DBN_zhou_2017.pdf

By: Y. Mao, H. Zhou & X. Gui

Source: NC State University Libraries
Added: November 21, 2020

2017 journal article

Methylation specific targeting of a chromatin remodeling complex from sponges to humans

SCIENTIFIC REPORTS, 7.

By: J. Cramer*, D. Pohlmann*, F. Gomez*, L. Mark*, B. Kornegay*, C. Hall*, E. Siraliev-Perez*, N. Walavalkar* ...

Sources: Web Of Science, ORCID
Added: August 6, 2018

2017 conference paper

The Demand for a Sound Baseline in GPU Memory Architecture Research

14th Annual Workshop on Duplicating, Deconstructing and Debunking (WDDD). Presented at the Workshop on Duplicating, Deconstructing and Debunking, Toronto, Canada. https://people.engr.ncsu.edu/hzhou/Hongwen_WDDD2017.pdf

By: H. Dai, C. Li, Z. Lin & H. Zhou

Event: Workshop on Duplicating, Deconstructing and Debunking at Toronto, Canada on June 25, 2017

Source: NC State University Libraries
Added: November 21, 2020

2016 journal article

A Cross-Platform SpMV Framework on Many-Core Architectures

ACM Transactions on Architecture and Code Optimization, 13(4), 1–25.

By: Y. Zhang*, S. Li*, S. Yan* & H. Zhou

author keywords: SpMV; segmented scan; BCCOO; OpenCL; CUDA; GPU; Intel MIC; parallel algorithms
Sources: Crossref, ORCID
Added: January 28, 2020

2016 article

A Model-Driven Approach to Warp/Thread-Block Level GPU Cache Bypassing

2016 ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC).

By: H. Dai, C. Li, H. Zhou, S. Gupta*, C. Kartsaklis* & M. Mantor*

Sources: Web Of Science, ORCID
Added: August 6, 2018

2016 conference paper

Enabling efficient preemption for SIMT architectures with lightweight context switching

SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 898–908.

By: Z. Lin, L. Nyland & Huiyang

Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2016 conference paper

Opencl-based erasure coding on heterogeneous architectures

Ieee international conference on application-specific systems, 7, 33–40.

By: G. Chen n, Huiyang, X. Shen, J. Gahm*, N. Venkat*, S. Booth*, J. Marshall*

Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2016 conference paper

Optimizing memory efficiency for deep convolutional neural networks on GPUs

SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 633–644.

By: C. Li, Y. Yang*, M. Feng*, S. Chakradhar* & Huiyang

Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2016 conference paper

Selective GPU Cache Bypassing for Un-Coalesced Loads

In X. Liao (Ed.), 22nd IEEE International Conference on Parallel and Distributed Systems : ICPADS 2016 : proceedings : 13-16 December 2016, Wuhan, Hubei, China.

By: C. Zhao*, F. Wang*, Z. Lin n, H. Zhou & N. Zheng*

Ed(s): X. Liao

Event: 22nd IEEE International Conference on Parallel and Distributed Systems at Wuhan, Hubei, China on December 13-16, 2016

Sources: NC State University Libraries, ORCID
Added: January 30, 2021

2016 conference paper

Tuning stencil codes in opencl for fpgas

Proceedings of the 34th ieee international conference on computer design (iccd), 249–256.

By: Q. Jia n & Huiyang

Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2015 conference paper

An Optimized AMPM-based Prefetcher Coupled with Configurable Cache Line Sizing

JILP Workshop on Computer Architecture Competitions (JWAC): 2nd Data Prefetching Championship (DPC2).

By: Q. Jia, M. Padia, K. Amboju & H. Zhou

Source: NC State University Libraries
Added: January 30, 2021

2015 conference paper

Analyzing graphics processor unit (GPU) instruction set architectures

Ieee international symposium on performance analysis of systems and, 155–156.

By: K. Mayank n, H. Dai, J. Wei* & Huiyang

Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2015 conference paper

Automatic data placement into GPU on-chip memory resources

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 23–33.

By: C. Li, Y. Yang, Z. Lin & Huiyang

Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2015 journal article

CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications

JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 30(1), 3–19.

By: Y. Yang*, C. Li & H. Zhou

author keywords: GPGPU; nested parallelism; compiler; local memory
Sources: Web Of Science, ORCID
Added: August 6, 2018

2015 conference paper

Locality-Driven Dynamic GPU Cache Bypassing

ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing, 61–77.

By: C. Li n, S. Song*, H. Dai n, A. Sidelnik*, S. Hari* & H. Zhou

Event: 29th International conference on supercomputing at Newport Beach/Irvine, CA on June 8-11, 2015

author keywords: GPU architecture Optimization; Locality; Cache Bypassing
Sources: Web Of Science, ORCID
Added: January 30, 2021

2015 conference paper

Revisiting ILP Designs for Throughput-Oriented GPGPU Architecture

Proceedings of the 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 121–130.

By: P. Xiang*, Y. Yang*, M. Mantor*, N. Rubin* & H. Zhou

Event: 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing at Shenzhen, China on May 4-7, 2015

author keywords: GPGPU; Heterogeneous; ILP; Energy
Sources: Web Of Science, ORCID
Added: February 6, 2021

2015 article

Spatial Locality-Aware Cache Partitioning for Effective Cache Sharing

2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), pp. 150–159.

By: S. Gupta* & H. Zhou

author keywords: shared last level cache; cache partitioning; spatial locality; cache management; high bandwidth memory
Sources: Web Of Science, ORCID
Added: August 6, 2018

2014 conference paper

A Case for a Flexible Scalar Unit in SIMT Architecture

Proceedings of 2014 IEEE 28th International Parallel and Distributed Processing Symposium. Presented at the 978-1-4799-3799-8, Phoenix, AZ.

By: Y. Yang*, P. Xiang n, M. Mantor*, N. Rubin*, L. Hsu*, Q. Dong*, H. Zhou

Event: 978-1-4799-3799-8 at Phoenix, AZ on May 19-23, 2014

Sources: NC State University Libraries, ORCID
Added: February 6, 2021

2014 chapter

A Highly Efficient FFT Using Shared-Memory Multiplexing

In Numerical Computations with GPUs (pp. 363–377).

By: Y. Yang n & H. Zhou

Sources: Crossref, ORCID
Added: January 28, 2020

2014 journal article

CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications

ACM SIGPLAN NOTICES, 49(8), 93–105.

By: Y. Yang* & H. Zhou

author keywords: Performance; Design; Experimentation; Languages; GPGPU; nested parallelism; compiler; local memory
Sources: Web Of Science, ORCID
Added: August 6, 2018

2014 conference paper

Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs

Ieee international symposium on performance analysis of systems and, 231–241.

By: C. Li, Y. Yang, H. Dai, S. Yan, F. Mueller & H. Zhou

Source: NC State University Libraries
Added: August 6, 2018

2014 conference paper

Warp-level divergence in GPUs: Characterization, impact, and mitigation

International symposium on high-performance computer, 284–295.

By: P. Xiang, Y. Yang* & Huiyang

Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2014 conference paper

yaSpM: Yet Another SpMV Framework on GPUs

Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 49(8), 107–118.

By: S. Yan, C. Li, Y. Zhang* & H. Zhou

Event: 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming at Orlando, FL

author keywords: SpMV; Segmented Scan; BCCOO; OpenCL; CUDA; GPU; Parallel algorithms
Sources: Web Of Science, ORCID
Added: August 6, 2018

2013 article

Adaptive Cache Bypassing for Inclusive Last Level Caches

IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), pp. 1243–1253.

By: S. Gupta, H. Gao* & H. Zhou

author keywords: Last level cache; cache bypassing; cache replacement policy; inclusion property
Sources: Web Of Science, ORCID
Added: August 6, 2018

2013 journal article

Analyzing locality of memory references in GPU architectures

MSPC '13: Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, 6.

By: S. Gupta n, P. Xiang n & H. Zhou

Event: ACM SIGPLAN Workshop on Memory Systems Performance and Correctness at Seattle, WA

Sources: NC State University Libraries, ORCID
Added: February 6, 2021

2013 journal article

Architecting against Software Cache-Based Side-Channel Attacks

IEEE TRANSACTIONS ON COMPUTERS, 62(7), 1276–1288.

By: J. Kong*, O. Aciicmez*, J. Seifert* & H. Zhou

author keywords: Cache memories; private/public key cryptosystems; side-channel attacks; architectural support for computer security
Sources: Web Of Science, ORCID
Added: August 6, 2018

2013 conference paper

Exploiting Uniform Vector Instructions for GPGPU Performance, Energy Efficiency, and Opportunistic Reliability Enhancement

Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, 433–442.

By: P. Xiang n, Y. Yang*, M. Mantor*, N. Rubin*, L. Hsu* & H. Zhou

Event: 27th International ACM Conference on International Conference on Supercomputing at Eugene, Oregon

Sources: NC State University Libraries, ORCID
Added: February 6, 2021

2013 journal article

Locality principle revisited: A probability-based quantitative approach

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 73(7), 1011–1027.

By: S. Gupta, P. Xiang, Y. Yang & H. Zhou

author keywords: Locality of references; Probability; Memory hierarchy; Last level cache; Cache replacement policy; Data prefetching; Locality optimizations
Sources: Web Of Science, ORCID
Added: August 6, 2018

2013 journal article

The Implementation of a High Performance GPGPU Compiler

INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 41(6), 768–781.

By: Y. Yang & H. Zhou

author keywords: GPU; Compiler; Optimization; Vectorization; OpenCL
Sources: Web Of Science, ORCID
Added: August 6, 2018

2012 journal article

A Unified Optimizing Compiler Framework for Different GPGPU Architectures

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 9(2).

By: Y. Yang, P. Xiang n, J. Kong*, M. Mantor* & H. Zhou

author keywords: Performance; Experimentation; Languages; GPGPU; OpenCL; CUDA; CUBLAS; GPU Computing
Sources: Web Of Science, ORCID
Added: August 6, 2018

2012 conference paper

CPU-assisted GPGPU on fused CPU-GPU architectures

International symposium on high-performance computer, 103–114.

By: Y. Yang, P. Xiang, M. Mantor & H. Zhou

Source: NC State University Libraries
Added: August 6, 2018

2012 conference paper

Fixing Performance Bugs: An Empirical Study of Open-Source GPGPU Programs

2012 41st International Conference on Parallel Processing. Presented at the 2012 41st International Conference on Parallel Processing (ICPP).

By: Y. Yang n, P. Xiang n, M. Mantor* & H. Zhou

Event: 2012 41st International Conference on Parallel Processing (ICPP)

Sources: Crossref, ORCID
Added: January 28, 2020

2012 conference paper

Locality principle revisited: A probability-based quantitative approach

2012 ieee 26th international parallel and distributed processing symposium (ipdps), 995–1009.

By: S. Gupta, P. Xiang, Y. Yang & Huiyang

Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2012 conference paper

Shared Memory Multiplexing: A Novel Way to Improve GPGPU Throughput

Proceedings of the 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT). Presented at the 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT), Minneapolis, MN, USA.

By: Y. Yang, P. Xiang, M. Mantor, N. Rubin & H. Zhou

Event: 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT) at Minneapolis, MN, USA on September 19-23, 2012

Source: NC State University Libraries
Added: February 7, 2021

2011 journal article

Combining Local and Global History for High Performance Data Prefetching

Journal of Instruction-Level Parallelism (JILP), 13, 1–14.

By: M. Dimitrov & H. Zhou

Event: Data Prefetching Championship (DPC-1) held with 15th International Symposium on High Performance Computer Architecture (HPCA-15) at Raleigh, NC on February 14-18, 2009

Source: NC State University Libraries
Added: August 6, 2018

2011 conference paper

Developing a High Performance GPGPU Compiler using Cetus

Proceedings of the Cetus Users and Compiler Infrastructure Workshop, International Conference on Parallel Architectures and Compilation Techniques (PACT’11). Presented at the International Conference on Parallel Architectures and Compilation Techniques (PACT’11).

By: Y. Yang & H. Zhou

Event: International Conference on Parallel Architectures and Compilation Techniques (PACT’11)

Source: NC State University Libraries
Added: February 7, 2021

2011 journal article

Exploring Correlation for Indirect Branch Prediction

2nd JILP Workshop on Computer Architecture Competitions (JWAC-2): Championship Branch Prediction. Presented at the 2nd JILP Workshop on Computer Architecture Competitions (JWAC-2): Championship Branch Prediction, held with ISCA-38.

By: N. Bhansali, C. Panirwla & H. Zhou

Event: 2nd JILP Workshop on Computer Architecture Competitions (JWAC-2): Championship Branch Prediction, held with ISCA-38

Source: NC State University Libraries
Added: February 7, 2021

2011 conference paper

Time-Ordered Event Traces: A New Debugging Primitive for Concurrency Bugs

2011 IEEE International Parallel & Distributed Processing Symposium. Presented at the Distributed Processing Symposium (IPDPS).

By: M. Dimitrov* & H. Zhou

Event: Distributed Processing Symposium (IPDPS)

Sources: Crossref, ORCID
Added: January 28, 2020

2010 article

A GPGPU Compiler for Memory Optimization and Parallelism Management

Yang, Y., Xiang, P., Kong, J., & Zhou, H. (2010, June). ACM SIGPLAN NOTICES, Vol. 45, pp. 86–97.

By: Y. Yang, P. Xiang, J. Kong & H. Zhou

author keywords: Performance; Experimentation; Languages; GPGPU; Compiler
Sources: Web Of Science, ORCID
Added: August 6, 2018

2010 conference paper

Accelerating MATLAB Image Processing Toolbox Functions on GPUs

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, 75–85.

By: J. Kong*, M. Dimitrov*, Y. Yang n, J. Liyanage*, L. Cao*, J. Staples*, M. Mantor*, H. Zhou

Event: 3rd Workshop on General-Purpose Computation on Graphics Processing Units at Pittsburgh, Pennsylvania, USA

Sources: NC State University Libraries, ORCID
Added: February 7, 2021

2010 article

An Optimizing Compiler for GPGPU Programs with Input-Data Sharing

Yang, Y., Xiang, P., Kong, J., & Zhou, H. (2010, May). ACM SIGPLAN NOTICES, Vol. 45, pp. 343–344.

By: Y. Yang, P. Xiang*, J. Kong* & H. Zhou

author keywords: Performance; Experimentation; Languages; GPGPU; Compiler
Sources: Web Of Science, ORCID
Added: August 6, 2018

2010 article

An Optimizing Compiler for GPGPU Programs with Input-Data Sharing

PPOPP 2010: PROCEEDINGS OF THE 2010 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, pp. 343–344.

By: Y. Yang, P. Xiang, J. Kong* & H. Zhou

author keywords: GPGPU; Compiler
Sources: Web Of Science, ORCID
Added: August 6, 2018

2010 conference paper

Improving privacy and lifetime of PCM-based main memory

2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN). Presented at the Networks (DSN).

By: J. Kong* & H. Zhou

Event: Networks (DSN)

Sources: Crossref, ORCID
Added: January 28, 2020

2009 conference paper

Anomaly-based bug prediction, isolation, and validation

Proceeding of the 14th international conference on Architectural support for programming languages and operating systems - ASPLOS '09. Presented at the Proceeding of the 14th international conference.

By: M. Dimitrov* & H. Zhou

Event: Proceeding of the 14th international conference

Sources: Crossref, ORCID
Added: January 28, 2020

2009 conference paper

Hardware-software integrated approaches to defend against software cache-based side channel attacks

2009 IEEE 15th International Symposium on High Performance Computer Architecture. Presented at the HPCA - 15 2009. IEEE 15th International Symposium on High Performance Computer Architecture.

By: J. Kong*, O. Aciicmez*, J. Seifert* & H. Zhou

Event: HPCA - 15 2009. IEEE 15th International Symposium on High Performance Computer Architecture

Sources: Crossref, ORCID
Added: January 28, 2020

2009 conference paper

Understanding software approaches for GPGPU reliability

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units - GPGPU-2. Presented at the 2nd Workshop.

By: M. Dimitrov*, M. Mantor* & H. Zhou

Event: 2nd Workshop

Sources: Crossref, ORCID
Added: January 28, 2020

2008 conference paper

Address-branch correlation: A novel locality for long-latency hard-to-predict branches

2008 IEEE 14th International Symposium on High Performance Computer Architecture. Presented at the 2008 IEEE 14th International Symposium on High Performance Computer Architecture (HPCA).

By: H. Gao*, Y. Ma*, M. Dimitrov* & H. Zhou

Event: 2008 IEEE 14th International Symposium on High Performance Computer Architecture (HPCA)

Sources: Crossref, ORCID
Added: January 28, 2020

2008 conference paper

Deconstructing new cache designs for thwarting software cache-based side channel attacks

Proceedings of the 2nd ACM workshop on Computer security architectures - CSAW '08. Presented at the the 2nd ACM workshop.

By: J. Kong*, O. Aciicmez*, J. Seifert* & H. Zhou

Event: the 2nd ACM workshop

Sources: Crossref, ORCID
Added: January 28, 2020

2007 journal article

Optimizing dual-core execution for power efficiency and transient-fault recovery

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 18(8), 1080–1093.

By: Y. Ma, H. Gao, M. Dimitrov & H. Zhou

author keywords: multiple data stream architectures; fault tolerance; low-power design
Sources: Web Of Science, ORCID
Added: August 6, 2018

2007 journal article

PMPM: Prediction by combining multiple partial matches

Journal of Instruction-Level Parallelism, 9, 1–18.

By: H. Gao & H. Zhou

Source: NC State University Libraries
Added: August 6, 2018

2007 conference paper

Unified Architectural Support for Soft-Error Protection or Software Bug Detection

16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007). Presented at the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

By: M. Dimitrov & H. Zhou

Event: 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007)

Sources: Crossref, ORCID
Added: January 28, 2020

2006 conference paper

Efficient Transient-Fault Tolerance for Multithreaded Processors Using Dual-Thread Execution

2006 International Conference on Computer Design. Presented at the 2006 International Conference on Computer Design.

By: Y. Ma* & H. Zhou

Event: 2006 International Conference on Computer Design

Sources: Crossref, ORCID
Added: January 28, 2020

2006 conference paper

Improving software security via runtime instruction-level taint checking

Proceedings of the 1st workshop on Architectural and system support for improving software dependability - ASID '06. Presented at the the 1st workshop.

By: J. Kong*, C. Zou* & H. Zhou

Event: the 1st workshop

Sources: Crossref, ORCID
Added: January 28, 2020

2006 conference paper

Locality-based Information Redundancy for Processor Reliability

2nd Workshop on Architectural Reliability (WAR-2) held in conjunction with 39th International Symposium on Microarchitecture (MICRO-39), 29–36.

By: M. Dimitrov & H. Zhou

Source: NC State University Libraries
Added: February 8, 2021

2006 conference paper

PMPM: Prediction by Combining Multiple Partial Matches

2nd Championship Branch Prediction (CBP-2) held with the 39th International Symposium on Microarchitecture (MICRO-39), 19–24.

By: H. Gao & H. Zhou

Source: NC State University Libraries
Added: February 8, 2021

2006 journal article

Using index functions to reduce conflict aliasing in branch prediction tables

IEEE Transactions on Computers, 55(8), 1057–1061.

By: G. Ma Y. & H. Zhou

Source: NC State University Libraries
Added: August 6, 2018

2005 journal article

A case for fault tolerance and performance enhancement using chip multi-processors

IEEE Computer Architecture Letters, 4, 1–4.

By: H. Zhou

Source: NC State University Libraries
Added: August 6, 2018

2005 journal article

Adaptive information processing: an effective way to improve perceptron branch predictors

Journal of Instruction-Level Parallelism, 7, 1–10.

By: H. Gao & H. Zhou

Source: NC State University Libraries
Added: August 6, 2018

2005 conference paper

Code size efficiency in global scheduling for ILP processors

Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures. Presented at the Sixth Annual Workshop on Interaction between Compilers and Computer Architectures.

By: H. Zhou & T. Conte n

Event: Sixth Annual Workshop on Interaction between Compilers and Computer Architectures

Sources: Crossref, ORCID
Added: January 28, 2020

2005 conference paper

Detecting global stride locality in value streams

30th Annual International Symposium on Computer Architecture, 2003. Proceedings. Presented at the ISCA 2003: 30th International Symposium on Computer Architecture.

By: H. Zhou, J. Flanagan & T. Conte

Event: ISCA 2003: 30th International Symposium on Computer Architecture

Sources: Crossref, ORCID
Added: January 28, 2020

2005 conference paper

Dual-core execution: building a highly scalable single-thread instruction window

14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05). Presented at the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

By: H. Zhou

Event: 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05)

Sources: Crossref, ORCID
Added: January 28, 2020

2005 journal article

Enhancing memory-level parallelism via recovery-free value prediction

IEEE Transactions on Computers, 54, 897–912.

By: Huiyang & T. Conte

Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2004 conference paper

Adaptive Information Processing: An Effective Way to Improve Perceptron Branch Predictors

1st Championship Branch Prediction (CBP-1) held with the 37th International Symposium on Microarchitecture (MICRO-37).

By: H. Gao & H. Zhou

Source: NC State University Libraries
Added: February 8, 2021

2003 journal article

Adaptive mode control: A static-power-efficient cache design

ACM Transactions on Embedded Computing Systems, 2(3), 347–372.

By: Huiyang, M. Toburen, E. Rotenberg & T. Conte

Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2003 report

Code size aware compilation for real-time applications

[Technical Report]. Computer Science Department, University of Central Florida.

By: H. Zhou

Source: NC State University Libraries
Added: February 8, 2021

2003 conference paper

Enhancing Memory Level Parallelism via Recovery-Free Value Prediction

The 2003 International Conference on Supercomputing (ICS'03), 326–335.

By: H. Zhou & T. Conte

Source: NC State University Libraries
Added: February 8, 2021

2003 report

Performance modeling of memory latency hiding techniques

[Technical Report,]. Raleigh, NC: Department of Electrical and Computer Engineering, North Carolina State University.

By: H. Zhou & T. Conte

Source: NC State University Libraries
Added: February 20, 2021

2003 chapter

Tree Traversal Scheduling: A Global Instruction Scheduling Technique for VLIW/EPIC Processors

In Languages and Compilers for Parallel Computing (Vol. 2624, pp. 223–238).

By: H. Zhou, M. Jennings n & T. Conte

Sources: Web Of Science, ORCID, Crossref
Added: August 6, 2018

2002 report

Using Performance Bounds to Guide Pre-scheduling Code Optimizations

[Technical Report,]. Raleigh, NC: Department of Electrical and Computer Engineering, North Carolina State University.

By: H. Zhou & T. Conte

Source: NC State University Libraries
Added: February 20, 2021

2001 report

A Treegion-based Unified Approach to Speculation and Predication in Global Instruction Scheduling

[Technical Report,]. Raleigh, NC: Department of Electrical and Computer Engineering, North Carolina State University.

By: M. Jennings, H. Zhou & T. Conte

Source: NC State University Libraries
Added: February 20, 2021

2001 report

A study of value speculative execution and mispeculation recovery in superscalar microprocessors

[Technical Report,]. Raleigh, NC: Department of Electrical and Computer Engineering, North Carolina State University.

By: H. Zhou, C. Fu, E. Rotenberg & T. Conte

Source: NC State University Libraries
Added: February 20, 2021

2001 conference paper

Adaptive mode control: A static-power-efficient cache design

2001 International Conference on Parallel Architectures and Compilation Techniques: Proceedings: 8-12 September, 2001, Barcelona, Catalunya, Spain, 61–70.

By: Huiyang, M. Toburen, E. Rotenberg & T. Conte

Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2000 report

Adaptive Mode Control: A Low-Leakage Power-Efficient Cache Design

[Technical Report]. Raleigh, NC: Department of Electrical and Computer Engineering, North Carolina State University.

By: H. Zhou, M. Toburen, E. Rotenberg & T. Conte

Source: NC State University Libraries
Added: February 20, 2021

2000 journal article

Automatic IC orientation checks

Machine Vision and Applications, 12(3), 107–112.

By: A. Kassim, Huiyang & S. Raganath

Sources: NC State University Libraries, ORCID
Added: August 6, 2018

1998 journal article

A fast algorithm for detecting die extrusion defects in IC packages

MACHINE VISION AND APPLICATIONS, 11(1), 37–41.

By: H. Zhou, A. Kassim* & S. Ranganath*

author keywords: IC package inspection; die extrusion defects; linear feature extraction; feature enhancement
Sources: Web Of Science, ORCID
Added: August 6, 2018

1996 journal article

Test sequencing and diagnosis in electronic system with decision table

MICROELECTRONICS AND RELIABILITY, 36(9), 1167–1175.

By: H. Zhou, L. Qu* & A. Li*

Sources: Web Of Science, ORCID
Added: August 6, 2018