huiyang zhou Long, X., Gong, X., Zhang, B., & Zhou, H. (2023). An Intelligent Framework for Oversubscription Management in CPU-GPU Unified Memory. JOURNAL OF GRID COMPUTING, 21(1). https://doi.org/10.1007/s10723-023-09646-1 Long, X., Gong, X., Zhang, B., & Zhou, H. (2023). Deep learning based data prefetching in CPU-GPU unified virtual memory. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 174, 19–31. https://doi.org/10.1016/j.jpdc.2022.12.004 Li, P., Liu, J., Patil, H. P., Hovland, P., & Zhou, H. (2023). Enhancing Virtual Distillation with Circuit Cutting for Quantum Error Mitigation. 2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD, pp. 94–101. https://doi.org/10.1109/ICCD58817.2023.00024 Tozlu, Y. S., & Zhou, H. (2023). PBVR: Physically Based Rendering in Virtual Reality. 2023 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION, IISWC, pp. 77–86. https://doi.org/10.1109/IISWC59245.2023.00039 Abdullah, R., Zhou, H., & Awad, A. (2023). Plutus: Bandwidth-Efficient Memory Security for GPUs. 2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, pp. 543–555. https://doi.org/10.1109/HPCA56546.2023.10071100 Freij, A., Zhou, H., & Solihin, Y. (2023). SecPB: Architectures for Secure Non-Volatile Memory with Battery-Backed Persist Buffers. 2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, pp. 677–690. https://doi.org/10.1109/HPCA56546.2023.10071082 Zhao, C., Gao, W., Nie, F., & Zhou, H. (2022). A Survey of GPU Multitasking Methods Supported by Hardware Architecture. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 33(6), 1451–1463. https://doi.org/10.1109/TPDS.2021.3115630 Yuan, S., Awad, A., Yudha, A. W. B., Solihin, Y., & Zhou, H. (2022). Adaptive Security Support for Heterogeneous Memory on GPUs. 2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022), pp. 213–228. https://doi.org/10.1109/HPCA53966.2022.00024 Li, P., Liu, J., Li, Y., & Zhou, H. (2022). Exploiting Quantum Assertions for Error Mitigation and Quantum Program Debugging. 2022 IEEE 40TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2022), 124–131. https://doi.org/10.1109/ICCD56317.2022.00028 Yudha, A. W. B., Meyer, J., Yuan, S., Zhou, H., & Solihin, Y. (2022). LITE: A Low-Cost Practical Inter-Operable GPU TEE. PROCEEDINGS OF THE 36TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ICS 2022. https://doi.org/10.1145/3524059.3532361 Liu, J., Li, P., & Zhou, H. (2022). Not All SWAPs Have the Same Cost: A Case for Optimization-Aware Qubit Routing. 2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022), pp. 709–725. https://doi.org/10.1109/HPCA53966.2022.00058 Yuan, S., Yudha, A. W. B., Solihin, Y., & Zhou, H. (2021). Analyzing Secure Memory Architecture for GPUs. 2021 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS 2021), pp. 59–69. https://doi.org/10.1109/ISPASS51385.2021.00017 Ravi, J., Nguyen, T., Zhou, H., & Becchi, M. (2021). PILOT: a Runtime System to Manage Multi-tenant GPU Unified Memory Footprint. 2021 IEEE 28TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2021), pp. 442–447. https://doi.org/10.1109/HiPC53243.2021.00063 Liu, J., Bello, L., & Zhou, H. (2021). Relaxed Peephole Optimization: A Novel Compiler Optimization for Quantum Circuits. CGO '21: PROCEEDINGS OF THE 2021 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO), pp. 301–314. https://doi.org/10.1109/CGO51591.2021.9370310 Liu, J., & Zhou, H. (2021). Systematic Approaches for Precise and Approximate Quantum State Runtime Assertion. 2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), pp. 179–193. https://doi.org/10.1109/HPCA51647.2021.00025 Mao, Y., Zhou, H., Gui, X., & Shen, J. (2020). Exploring Convolution Neural Network for Branch Prediction. IEEE Access, 8, 152008–152016. https://doi.org/10.1109/ACCESS.2020.3017196 Zhao, C., Gao, W., Nie, F., Wang, F., & Zhou, H. (2020). Fair and cache blocking aware warp scheduling for concurrent kernel execution on GPU. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 112, 1093–1105. https://doi.org/10.1016/j.future.2020.05.023 Liu, J., Byrd, G., & Zhou, H. (2020). Quantum Circuits for Dynamic Runtime Assertions in Quantum Computation. ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, 1017–1030. https://doi.org/10.1145/3373376.3378488 Liu, J., & Zhou, H. (2020). Reliability Modeling of NISQ-Era Quantum Computers. 2020 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2020), pp. 94–105. https://doi.org/10.1109/IISWC50251.2020.00018 Yudha, A. W. B., Kimura, K., Zhou, H., & Solihin, Y. (2020). Scalable and Fast Lazy Persistency on GPUs. 2020 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2020), pp. 252–263. https://doi.org/10.1109/IISWC50251.2020.00032 Lin, Z., Dai, H., Mantor, M., & Zhou, H. (2019). Coordinated CTA Combination and Bandwidth Partitioning for GPU Concurrent Kernel Execution. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 16(3). https://doi.org/10.1145/3326124 Lin, Z., Alshboul, M., Solihin, Y., & Zhou, H. (2019). Exploring Memory Persistency Models for GPUs. 28th International Conference on Parallel Architectures and Compilation Techniques (PACT), 310–322. https://doi.org/10.1109/PACT.2019.00032 Guan, H., Ning, L., Lin, Z., Shen, X., Zhou, H., & Lim, S. (2019). In-Place Zero-Space Memory Protection for CNN. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, & R. Garnett (Eds.), Advances in Neural Information Processing Systems (Vol. 32). San Mateo, CA: Morgan Kaufmann Publishers. Zhou, H., & Byrd, G. T. (2019). Quantum Circuits for Dynamic Runtime Assertions in Quantum Computation. IEEE Computer Architecture Letters, 18(2), 111–114. https://doi.org/10.1109/LCA.2019.2935049 Liu, J., Byrd, G., & Zhou, H. (2019, December 9). Quantum Circuits for Dynamic Runtime Assertions in Quantum Computation. https://doi.org/10.36227/techrxiv.11319929 Liu, J., Byrd, G., & Zhou, H. (2019, December 9). Quantum Circuits for Dynamic Runtime Assertions in Quantum Computation. https://doi.org/10.36227/techrxiv.11319929.v1 Lin, Z., Mathur, U., & Zhou, H. (2019). Scatter-and-Gather Revisited: High-Performance Side-Channel-Resistant AES on GPUs. 12TH WORKSHOP ON GENERAL PURPOSE PROCESSING USING GPUS (GPGPU 12), pp. 2–11. https://doi.org/10.1145/3300053.3319415 Dai, H., Lin, Z., Li, C., Zhao, C., Wang, F., Zheng, N., & Zhou, H. (2018). Accelerate GPU Concurrent Kernel Execution by Mitigating Memory Pipeline Stalls. 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). https://doi.org/10.1109/HPCA.2018.00027 Zhong, Y., Li, C., Zhou, H., & Wang, G. (2018). Developing Noise-Resistant Three-Dimensional Single Particle Tracking Using Deep Neural Networks. ANALYTICAL CHEMISTRY, 90(18), 10748–10757. https://doi.org/10.1021/acs.analchem.8b01334 Lin, Z., Mantor, M., & Zhou, H. (2018). GPU Performance vs. Thread-Level Parallelism: Scalability Analysis and a Novel Way to Improve TLP. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 15(1). https://doi.org/10.1145/3177964 Verma, A., Zhou, H., Booth, S., King, R., Coole, J., Keep, A., … Feng, W.-chun. (2017). Developing Dynamic Profiling and Debugging Support in OpenCL for FPGAs. PROCEEDINGS OF THE 2017 54TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC). https://doi.org/10.1145/3061639.3062230 Chen, G., Zhao, Y., Shen, X., & Zhou, H. (2017). EffiSha: A Software Framework for Enabling Efficient Preemptive Scheduling of GPU. PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 3–16. https://doi.org/10.1145/3018743.3018748 Mao, Y., Zhou, H., & Gui, X. (2017). Exploring deep neural networks for branch prediction [Technical Report]. Retrieved from Electrical and Computer Engineering Department, N.C. State University website: https://people.engr.ncsu.edu/hzhou/CNN_DBN_zhou_2017.pdf Cramer, J. M., Pohlmann, D., Gomez, F., Mark, L., Kornegay, B., Hall, C., … Williams, D. C., Jr. (2017). Methylation specific targeting of a chromatin remodeling complex from sponges to humans. SCIENTIFIC REPORTS, 7. https://doi.org/10.1038/srep40674 Dai, H., Li, C., Lin, Z., & Zhou, H. (2017). The Demand for a Sound Baseline in GPU Memory Architecture Research. 14th Annual Workshop on Duplicating, Deconstructing and Debunking (WDDD). Presented at the Workshop on Duplicating, Deconstructing and Debunking, Toronto, Canada. Retrieved from https://people.engr.ncsu.edu/hzhou/Hongwen_WDDD2017.pdf Zhang, Y., Li, S., Yan, S., & Zhou, H. (2016). A Cross-Platform SpMV Framework on Many-Core Architectures. ACM Transactions on Architecture and Code Optimization, 13(4), 1–25. https://doi.org/10.1145/2994148 Dai, H., Li, C., Zhou, H., Gupta, S., Kartsaklis, C., & Mantor, M. (2016). A Model-Driven Approach to Warp/Thread-Block Level GPU Cache Bypassing. 2016 ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC). https://doi.org/10.1145/2897937.2897966 Lin, Z., Nyland, L., & Huiyang. (2016). Enabling efficient preemption for SIMT architectures with lightweight context switching. SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 898–908. https://doi.org/10.1109/sc.2016.76 Chen, G. Y., Huiyang, Shen, X., Gahm, J., Venkat, N., Booth, S., & Marshall, J. (2016). Opencl-based erasure coding on heterogeneous architectures. Ieee international conference on application-specific systems, 7, 33–40. https://doi.org/10.1109/asap.2016.7760770 Li, C., Yang, Y., Feng, M., Chakradhar, S., & Huiyang. (2016). Optimizing memory efficiency for deep convolutional neural networks on GPUs. SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 633–644. https://doi.org/10.1109/sc.2016.53 Zhao, C., Wang, F., Lin, Z., Zhou, H., & Zheng, N. (2016). Selective GPU Cache Bypassing for Un-Coalesced Loads. In X. Liao (Ed.), 22nd IEEE International Conference on Parallel and Distributed Systems : ICPADS 2016 : proceedings : 13-16 December 2016, Wuhan, Hubei, China. https://doi.org/10.1109/ICPADS.2016.0122 Jia, Q., & Huiyang. (2016). Tuning stencil codes in opencl for fpgas. Proceedings of the 34th ieee international conference on computer design (iccd), 249–256. https://doi.org/10.1109/iccd.2016.7753287 Jia, Q., Padia, M. B., Amboju, K., & Zhou, H. (2015). An Optimized AMPM-based Prefetcher Coupled with Configurable Cache Line Sizing. JILP Workshop on Computer Architecture Competitions (JWAC): 2nd Data Prefetching Championship (DPC2). Mayank, K., Dai, H. W., Wei, J. Z., & Huiyang. (2015). Analyzing graphics processor unit (GPU) instruction set architectures. Ieee international symposium on performance analysis of systems and, 155–156. https://doi.org/10.1109/ispass.2015.7095794 Li, C., Yang, Y., Lin, Z., & Huiyang. (2015). Automatic data placement into GPU on-chip memory resources. 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 23–33. https://doi.org/10.1109/cgo.2015.7054184 Yang, Y., Li, C., & Zhou, H. (2015). CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 30(1), 3–19. https://doi.org/10.1007/s11390-015-1500-y Li, C., Song, S., Dai, H., Sidelnik, A., Hari, S., & Zhou, H. (2015). Locality-Driven Dynamic GPU Cache Bypassing. ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing, 61–77. https://doi.org/10.1145/2751205.2751237 Xiang, P., Yang, Y., Mantor, M., Rubin, N., & Zhou, H. (2015). Revisiting ILP Designs for Throughput-Oriented GPGPU Architecture. Proceedings of the 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 121–130. https://doi.org/10.1109/CCGrid.2015.14 Gupta, S., & Zhou, H. (2015). Spatial Locality-Aware Cache Partitioning for Effective Cache Sharing. 2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), pp. 150–159. https://doi.org/10.1109/icpp.2015.24 Yang, Y., Xiang, P., Mantor, M., Rubin, N., Hsu, L., Dong, Q., & Zhou, H. (2014). A Case for a Flexible Scalar Unit in SIMT Architecture. Proceedings of 2014 IEEE 28th International Parallel and Distributed Processing Symposium. Presented at the 978-1-4799-3799-8, Phoenix, AZ. https://doi.org/10.1109/IPDPS.2014.21 Yang, Y., & Zhou, H. (2014). A Highly Efficient FFT Using Shared-Memory Multiplexing. In Numerical Computations with GPUs (pp. 363–377). https://doi.org/10.1007/978-3-319-06548-9_17 Yang, Y., & Zhou, H. (2014). CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications. ACM SIGPLAN NOTICES, 49(8), 93–105. https://doi.org/10.1145/2692916.2555254 Li, C., Yang, Y., Dai, H. W., Yan, S. G., Mueller, F., & Zhou, H. Y. (2014). Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs. Ieee international symposium on performance analysis of systems and, 231–241. Xiang, P., Yang, Y., & Huiyang. (2014). Warp-level divergence in GPUs: Characterization, impact, and mitigation. International symposium on high-performance computer, 284–295. https://doi.org/10.1109/hpca.2014.6835939 Yan, S., Li, C., Zhang, Y., & Zhou, H. (2014). yaSpM: Yet Another SpMV Framework on GPUs. Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 49(8), 107–118. https://doi.org/10.1145/2692916.2555255 Gupta, S., Gao, H., & Zhou, H. (2013). Adaptive Cache Bypassing for Inclusive Last Level Caches. IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), pp. 1243–1253. https://doi.org/10.1109/ipdps.2013.16 Gupta, S., Xiang, P., & Zhou, H. (2013). Analyzing locality of memory references in GPU architectures. MSPC '13: Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, 6. https://doi.org/10.1145/2492408.2492423 Kong, J., Aciicmez, O., Seifert, J.-P., & Zhou, H. (2013). Architecting against Software Cache-Based Side-Channel Attacks. IEEE TRANSACTIONS ON COMPUTERS, 62(7), 1276–1288. https://doi.org/10.1109/tc.2012.78 Xiang, P., Yang, Y., Mantor, M., Rubin, N., Hsu, L., & Zhou, H. (2013). Exploiting Uniform Vector Instructions for GPGPU Performance, Energy Efficiency, and Opportunistic Reliability Enhancement. Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, 433–442. https://doi.org/10.1145/2464996.2465022 Gupta, S., Xiang, P., Yang, Y., & Zhou, H. (2013). Locality principle revisited: A probability-based quantitative approach. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 73(7), 1011–1027. https://doi.org/10.1016/j.jpdc.2013.01.010 Yang, Y., & Zhou, H. (2013). The Implementation of a High Performance GPGPU Compiler. INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 41(6), 768–781. https://doi.org/10.1007/s10766-012-0228-3 Yang, Y., Xiang, P., Kong, J., Mantor, M., & Zhou, H. (2012). A Unified Optimizing Compiler Framework for Different GPGPU Architectures. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 9(2). https://doi.org/10.1145/2207222.2207225 Yang, Y., Xiang, P., Mantor, M., & Zhou, H. Y. (2012). CPU-assisted GPGPU on fused CPU-GPU architectures. International symposium on high-performance computer, 103–114. Yang, Y., Xiang, P., Mantor, M., & Zhou, H. (2012). Fixing Performance Bugs: An Empirical Study of Open-Source GPGPU Programs. 2012 41st International Conference on Parallel Processing. Presented at the 2012 41st International Conference on Parallel Processing (ICPP). https://doi.org/10.1109/icpp.2012.30 Gupta, S., Xiang, P., Yang, Y., & Huiyang. (2012). Locality principle revisited: A probability-based quantitative approach. 2012 ieee 26th international parallel and distributed processing symposium (ipdps), 995–1009. https://doi.org/10.1109/ipdps.2012.93 Yang, Y., Xiang, P., Mantor, M., Rubin, N., & Zhou, H. (2012). Shared Memory Multiplexing: A Novel Way to Improve GPGPU Throughput. Proceedings of the 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT). Presented at the 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT), Minneapolis, MN, USA. Dimitrov, M., & Zhou, H. (2011). Combining Local and Global History for High Performance Data Prefetching. Journal of Instruction-Level Parallelism (JILP), 13, 1–14. Yang, Y., & Zhou, H. (2011). Developing a High Performance GPGPU Compiler using Cetus. Proceedings of the Cetus Users and Compiler Infrastructure Workshop, International Conference on Parallel Architectures and Compilation Techniques (PACT’11). Presented at the International Conference on Parallel Architectures and Compilation Techniques (PACT’11). Bhansali, N., Panirwla, C., & Zhou, H. (2011). Exploring Correlation for Indirect Branch Prediction. 2nd JILP Workshop on Computer Architecture Competitions (JWAC-2): Championship Branch Prediction. Presented at the 2nd JILP Workshop on Computer Architecture Competitions (JWAC-2): Championship Branch Prediction, held with ISCA-38. Dimitrov, M., & Zhou, H. (2011). Time-Ordered Event Traces: A New Debugging Primitive for Concurrency Bugs. 2011 IEEE International Parallel & Distributed Processing Symposium. Presented at the Distributed Processing Symposium (IPDPS). https://doi.org/10.1109/ipdps.2011.38 Yang, Y., Xiang, P., Kong, J., & Zhou, H. (2010, June). A GPGPU Compiler for Memory Optimization and Parallelism Management. ACM SIGPLAN NOTICES, Vol. 45, pp. 86–97. https://doi.org/10.1145/1809028.1806606 Kong, J., Dimitrov, M., Yang, Y., Liyanage, J., Cao, L., Staples, J., … Zhou, H. (2010). Accelerating MATLAB Image Processing Toolbox Functions on GPUs. Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, 75–85. https://doi.org/10.1145/1735688.1735703 Yang, Y., Xiang, P., Kong, J., & Zhou, H. (2010, May). An Optimizing Compiler for GPGPU Programs with Input-Data Sharing. ACM SIGPLAN NOTICES, Vol. 45, pp. 343–344. https://doi.org/10.1145/1837853.1693505 Yang, Y., Xiang, P., Kong, J., & Zhou, H. (2010). An Optimizing Compiler for GPGPU Programs with Input-Data Sharing. PPOPP 2010: PROCEEDINGS OF THE 2010 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, pp. 343–344. https://doi.org/10.1145/1693453.1693505 Kong, J., & Zhou, H. (2010). Improving privacy and lifetime of PCM-based main memory. 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN). Presented at the Networks (DSN). https://doi.org/10.1109/dsn.2010.5544298 Dimitrov, M., & Zhou, H. (2009). Anomaly-based bug prediction, isolation, and validation. Proceeding of the 14th international conference on Architectural support for programming languages and operating systems - ASPLOS '09. Presented at the Proceeding of the 14th international conference. https://doi.org/10.1145/1508244.1508252 Kong, J., Aciicmez, O., Seifert, J.-P., & Zhou, H. (2009). Hardware-software integrated approaches to defend against software cache-based side channel attacks. 2009 IEEE 15th International Symposium on High Performance Computer Architecture. Presented at the HPCA - 15 2009. IEEE 15th International Symposium on High Performance Computer Architecture. https://doi.org/10.1109/hpca.2009.4798277 Dimitrov, M., Mantor, M., & Zhou, H. (2009). Understanding software approaches for GPGPU reliability. Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units - GPGPU-2. Presented at the 2nd Workshop. https://doi.org/10.1145/1513895.1513907 Gao, H., Ma, Y., Dimitrov, M., & Zhou, H. (2008). Address-branch correlation: A novel locality for long-latency hard-to-predict branches. 2008 IEEE 14th International Symposium on High Performance Computer Architecture. Presented at the 2008 IEEE 14th International Symposium on High Performance Computer Architecture (HPCA). https://doi.org/10.1109/hpca.2008.4658629 Kong, J., Aciicmez, O., Seifert, J.-P., & Zhou, H. (2008). Deconstructing new cache designs for thwarting software cache-based side channel attacks. Proceedings of the 2nd ACM workshop on Computer security architectures - CSAW '08. Presented at the the 2nd ACM workshop. https://doi.org/10.1145/1456508.1456514 Ma, Y., Gao, H., Dimitrov, M., & Zhou, H. (2007). Optimizing dual-core execution for power efficiency and transient-fault recovery. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 18(8), 1080–1093. https://doi.org/10.1109/tpds.2007.4288106 Gao, H., & Zhou, H. (2007). PMPM: Prediction by combining multiple partial matches. Journal of Instruction-Level Parallelism, 9, 1–18. Dimitrov, M., & Zhou, H. (2007). Unified Architectural Support for Soft-Error Protection or Software Bug Detection. 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007). Presented at the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007). https://doi.org/10.1109/pact.2007.4336201 Ma, Y., & Zhou, H. (2006). Efficient Transient-Fault Tolerance for Multithreaded Processors Using Dual-Thread Execution. 2006 International Conference on Computer Design. Presented at the 2006 International Conference on Computer Design. https://doi.org/10.1109/iccd.2006.4380804 Kong, J., Zou, C. C., & Zhou, H. (2006). Improving software security via runtime instruction-level taint checking. Proceedings of the 1st workshop on Architectural and system support for improving software dependability - ASID '06. Presented at the the 1st workshop. https://doi.org/10.1145/1181309.1181313 Dimitrov, M., & Zhou, H. (2006). Locality-based Information Redundancy for Processor Reliability. 2nd Workshop on Architectural Reliability (WAR-2) held in conjunction with 39th International Symposium on Microarchitecture (MICRO-39), 29–36. Gao, H., & Zhou, H. (2006). PMPM: Prediction by Combining Multiple Partial Matches. 2nd Championship Branch Prediction (CBP-2) held with the 39th International Symposium on Microarchitecture (MICRO-39), 19–24. Ma Y., G. H., & Zhou, H. (2006). Using index functions to reduce conflict aliasing in branch prediction tables. IEEE Transactions on Computers, 55(8), 1057–1061. Zhou, H. (2005). A case for fault tolerance and performance enhancement using chip multi-processors. IEEE Computer Architecture Letters, 4, 1–4. Gao, H., & Zhou, H. (2005). Adaptive information processing: an effective way to improve perceptron branch predictors. Journal of Instruction-Level Parallelism, 7, 1–10. Zhou, H., & Conte, T. M. (2005). Code size efficiency in global scheduling for ILP processors. Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures. Presented at the Sixth Annual Workshop on Interaction between Compilers and Computer Architectures. https://doi.org/10.1109/intera.2002.995845 Zhou, H., Flanagan, J., & Conte, T. M. (2005). Detecting global stride locality in value streams. 30th Annual International Symposium on Computer Architecture, 2003. Proceedings. Presented at the ISCA 2003: 30th International Symposium on Computer Architecture. https://doi.org/10.1109/isca.2003.1207011 Zhou, H. (2005). Dual-core execution: building a highly scalable single-thread instruction window. 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05). Presented at the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05). https://doi.org/10.1109/pact.2005.18 Huiyang, & Conte, T. M. (2005). Enhancing memory-level parallelism via recovery-free value prediction. IEEE Transactions on Computers, 54, 897–912. https://doi.org/10.1109/tc.2005.117 Gao, H., & Zhou, H. (2004). Adaptive Information Processing: An Effective Way to Improve Perceptron Branch Predictors. 1st Championship Branch Prediction (CBP-1) held with the 37th International Symposium on Microarchitecture (MICRO-37). Huiyang, Toburen, M. C., Rotenberg, E., & Conte, T. M. (2003). Adaptive mode control: A static-power-efficient cache design. ACM Transactions on Embedded Computing Systems, 2(3), 347–372. https://doi.org/10.1145/860176.860181 Zhou, H. (2003). Code size aware compilation for real-time applications [Technical Report]. Computer Science Department, University of Central Florida. Zhou, H., & Conte, T. M. (2003). Enhancing Memory Level Parallelism via Recovery-Free Value Prediction. The 2003 International Conference on Supercomputing (ICS'03), 326–335. Zhou, H., & Conte, T. M. (2003). Performance modeling of memory latency hiding techniques [Technical Report,]. Raleigh, NC: Department of Electrical and Computer Engineering, North Carolina State University. Zhou, H., Jennings, M. D., & Conte, T. M. (2003). Tree Traversal Scheduling: A Global Instruction Scheduling Technique for VLIW/EPIC Processors. In Languages and Compilers for Parallel Computing (Vol. 2624, pp. 223–238). https://doi.org/10.1007/3-540-35767-x_15 Zhou, H., & Conte, T. M. (2002). Using Performance Bounds to Guide Pre-scheduling Code Optimizations [Technical Report,]. Raleigh, NC: Department of Electrical and Computer Engineering, North Carolina State University. Jennings, M. D., Zhou, H., & Conte, T. M. (2001). A Treegion-based Unified Approach to Speculation and Predication in Global Instruction Scheduling [Technical Report,]. Raleigh, NC: Department of Electrical and Computer Engineering, North Carolina State University. Zhou, H., Fu, C., Rotenberg, E., & Conte, T. (2001). A study of value speculative execution and mispeculation recovery in superscalar microprocessors [Technical Report,]. Raleigh, NC: Department of Electrical and Computer Engineering, North Carolina State University. Huiyang, Toburen, M. C., Rotenberg, E., & Conte, T. M. (2001). Adaptive mode control: A static-power-efficient cache design. 2001 International Conference on Parallel Architectures and Compilation Techniques: Proceedings: 8-12 September, 2001, Barcelona, Catalunya, Spain, 61–70. https://doi.org/10.1109/pact.2001.953288 Zhou, H., Toburen, M., Rotenberg, E., & Conte, T. (2000). Adaptive Mode Control: A Low-Leakage Power-Efficient Cache Design [Technical Report]. Raleigh, NC: Department of Electrical and Computer Engineering, North Carolina State University. Kassim, A. A., Huiyang, & Raganath, S. (2000). Automatic IC orientation checks. Machine Vision and Applications, 12(3), 107–112. https://doi.org/10.1007/s001380050129 Zhou, H., Kassim, A. A., & Ranganath, S. (1998). A fast algorithm for detecting die extrusion defects in IC packages. MACHINE VISION AND APPLICATIONS, 11(1), 37–41. https://doi.org/10.1007/s001380050088 Zhou, H. Y., Qu, L. S., & Li, A. H. (1996). Test sequencing and diagnosis in electronic system with decision table. MICROELECTRONICS AND RELIABILITY, 36(9), 1167–1175. https://doi.org/10.1016/0026-2714(95)00142-5