2024 article
Data Enclave: A Data-Centric Trusted Execution Environment
2024 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA 2024, pp. 218–232.
2024 article
ESG: Pipeline-Conscious E.icient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs
PROCEEDINGS OF THE 33RD INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2024.
2024 journal article
Enabling Efficient Deep Learning on MCU With Transient Redundancy Elimination
IEEE TRANSACTIONS ON COMPUTERS, 73(12), 2649–2663.
2024 conference paper
SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile
Niu, W., Sanim, M. M. R., Shu, Z., Guan, J., Shen, X., Yin, M., … Ren, B. (2024, April 27).
2024 conference paper
WiseGraph: Optimizing GNN with Joint Workload Partition of Graph and Operations
Huang, K., Zhai, J., Zheng, L., Wang, H., Jin, Y., Zhang, Q., … Shen, X. (2024, April 22).
2023 journal article
Accelerating matrix-centric graph processing on GPUs through bit-level optimizations
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 177, 53–67.
2023 journal article
Automated Translation of Functional Big Data similar to eries to SQL
PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 7(OOPSLA).
2023 article
BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs
PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2023, pp. 264–276.
2023 journal article
CompressGraph: Efficient Parallel Graph Analytics with Rule-Based Compression
Proceedings of the ACM on Management of Data.
2023 journal article
Expanding the Edge: Enabling Efficient Winograd CNN Inference With Deep Reuse on Edge Device
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 35(10), 10181–10196.
2023 article
Finding Reusable Machine Learning Components to Build Programming Language Processing Pipelines
SOFTWARE ARCHITECTURE. ECSA 2022 TRACKS AND WORKSHOPS, Vol. 13928, pp. 402–417.
2023 article
Reconciling Selective Logging and Hardware Persistent Memory Transaction
2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, pp. 664–676.
2023 article
SpecPMT: Speculative Logging for Resolving Crash Consistency Overhead of Persistent Memory
PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, VOL 2, ASPLOS 2023, pp. 762–777.
2022 article
Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU
2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022), pp. 515–525.
2022 article
Brief Industry Paper: Enabling Level-4 Autonomous Driving on a Single $1k Off-the-Shelf Card
2022 IEEE 28TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS), pp. 297–300.
2022 article
DREW: Efficient Winograd CNN Inference with Deep Reuse
PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), pp. 1807–1816.
2022 journal article
Energy Efficient Boosting of GEMM Accelerators for DNN via Reuse
ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 27(5).
2022 article
FFCCD: Fence-Free Crash-Consistent Concurrent Defragmentation for Persistent Memory
PROCEEDINGS OF THE 2022 THE 49TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '22), pp. 274–288.
2022 article
GCD(2) : A Globally Optimizing Compiler for Mapping DNNs to Mobile DSPs
2022 55TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), pp. 512–529.
2022 article
IDE Augmented with Human-Learning Inspired Natural Language Programming
2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2022), pp. 110–114.
2022 article
Interactive NLU-Powered Ontology-Based Workflow Synthesis for FAIR Support of HPC
2022 IEEE/ACM INTERNATIONAL WORKSHOP ON HPC USER SUPPORT TOOLS (HUST), pp. 29–40.
2022 journal article
Preserving Addressability Upon GC-Triggered Data Movements on Non-Volatile Memory
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 19(2).
2022 journal article
Survey: Exploiting Data Redundancy for Optimization of Deep Learning
ACM COMPUTING SURVEYS, 55(10).
2022 article
Temporal Exposure Reduction Protection for Persistent Memory
2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022), pp. 908–924.
2021 journal article
A Machine Learning Based Ensemble Forecasting Optimization Algorithm for Preseason Prediction of Atlantic Hurricane Activity
ATMOSPHERE, 12(4).
2021 article
Brief Industry Paper: Towards Real-Time 3D Object Detection for Autonomous Vehicles with Pruning Search
2021 IEEE 27TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS 2021), pp. 425–428.
2021 journal article
CoCoPIE: Enabling Real-Time AI on Off-the-Shelf Mobile Devices via Compression-Compilation Co-Design
COMMUNICATIONS OF THE ACM, 64(6), 62–68.
2021 journal article
Coarsening Optimization for Differentiable Programming
PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 5(OOPSLA).
2021 journal article
Exploring Data Analytics Without Decompression on Embedded GPU Systems
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 33(7), 1553–1568.
2021 article
G-TADOC: Enabling Efficient GPU-Based Text Analytics without Decompression
2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), pp. 1679–1690.
2021 journal article
General Reuse-Centric CNN Accelerator
IEEE TRANSACTIONS ON COMPUTERS, 71(4), 880–891.
2021 article
HPC Ontology: Towards a Unified Ontology for Managing Training Datasets and AI Models for High-Performance Computing
PROCEEDINGS OF THE WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC 2021), pp. 69–80.
2021 article
HPCFAIR: Enabling FAIR AI for HPC Applications
PROCEEDINGS OF THE WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC 2021), pp. 58–68.
2021 article
Hardware-Based Address-Centric Acceleration of Key-Value Store
2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), pp. 736–748.
2021 journal article
POCLib: A High-Performance Framework for Enabling Near Orthogonal Processing on Compression
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 33(2), 459–475.
2021 article
Recurrent Neural Networks Meet Context-Free Grammar: Two Birds with One Stone
2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), pp. 1078–1083.
2021 journal article
Reuse-centric k-means configuration
INFORMATION SYSTEMS, 100.
2021 article
Revisit the Scalability of Deep Auto-Regressive Models for Graph Generation
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN).
2021 article
Seeds of SEED: New Security Challenges for Persistent Memory
2021 INTERNATIONAL SYMPOSIUM ON SECURE AND PRIVATE EXECUTION ENVIRONMENT DESIGN (SEED 2021), pp. 83–88.
2021 journal article
Simpler Hyperparameter Optimization for Software Analytics: Why, How, When
IEEE Transactions on Software Engineering, 48(8), 1–1.
Contributors: A. Agrawal*, X. Yang n, R. Agrawal n, R. Yedida n , n & T. Menzies n
2021 article
Supporting Legacy Libraries on Non-Volatile Memory: A User-Transparent Approach
2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), pp. 443–455.
2021 article
Toward Efficient Interactions between Python and Native Libraries
PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21), pp. 1117–1128.
2021 journal article
UDF to SQL Translation through Compositional Lazy Inductive Synthesis
PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 5(OOPSLA).
2020 journal article
An Automatic Synthesizer of Advising Tools for High Performance Computing
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 32(2), 330–341.
2020 journal article
DIAC An Inter-app Conflicts Detector for Open IoT Systems
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 19(6).
2020 conference paper
Enabling Efficient Random Access to Hierarchically-Compressed Data
2020 IEEE 36th International Conference on Data Engineering (ICDE), 1069–1080.
2020 article
GOPipe: A Granularity-Oblivious Programming Framework for Pipelined Stencil Executions on GPU
PACT '20: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, pp. 43–54.
2020 article
HARP: Holistic Analysis for Refactoring Python-Based Analytics Programs
2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020), pp. 506–517.
2020 article
Hardware-Based Domain Virtualization for Intra-Process Isolation of Persistent Memory Objects
2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2020), pp. 680–692.
2020 article
MERR: Improving Security of Persistent Memory Objects via Efficient Memory Exposure Reduction and Randomization
TWENTY-FIFTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXV), pp. 987–1000.
2020 journal article
Sequential Model Optimization for Software Effort Estimation
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 48(6), 1994–2009.
2020 journal article
TADOC: Text analytics directly on compression
VLDB JOURNAL, 30(2), 163–188.
2019 article
Adaptive Deep Reuse: Accelerating CNN Training on the Fly
2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), pp. 1538–1549.
2019 conference paper
Deep reuse
Proceedings of the ACM International Conference on Supercomputing - ICS '19. Presented at the the ACM International Conference.
Event: the ACM International Conference
2019 journal article
Enabling Runtime SpMV Format Selection through an Overhead Conscious Method
IEEE Transactions on Parallel and Distributed Systems, 31(1), 80–93.
2019 conference paper
HiWayLib
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '19. Presented at the the Twenty-Fourth International Conference.
Event: the Twenty-Fourth International Conference
2019 journal article
How to "DODGE" Complex Software Analytics
IEEE Transactions on Software Engineering, 47(10), 1–1.
2019 conference paper
IA-graph based inter-app conflicts detection in open IoT systems
Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems - LCTES 2019. Presented at the the 20th ACM SIGPLAN/SIGBED International Conference.
Event: the 20th ACM SIGPLAN/SIGBED International Conference
2019 conference paper
In-Place Zero-Space Memory Protection for CNN
In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, & R. Garnett (Eds.), Advances in Neural Information Processing Systems Proceedings.
Ed(s): H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox & R. Garnett
2019 article
POSTER: GOPipe: A Granularity-Oblivious Programming Framework for Pipelined Stencil Executions on GPU
PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19), pp. 431–432.
2019 article
Special Issue: Graph Computing
Jin, H., Shen, X., Lovas, R., & Liao, X. (2020, February 10). CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, Vol. 32.
2019 conference paper
Streamline Density Peak Clustering for Practical Adoptions
Proceedings of the 28th ACM International Conference on Information and Knowledge Management - CIKM '19. Presented at the the 28th ACM International Conference.
Event: the 28th ACM International Conference
2019 conference paper
Wootz: a compiler-based framework for fast CNN pruning via composability
Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation - PLDI 2019. Presented at the the 40th ACM SIGPLAN Conference.
Event: the 40th ACM SIGPLAN Conference
2018 conference paper
Bridging the Gap between Deep Learning and Sparse Matrix Format Selection
ACM SIGPLAN NOTICES, 53(1), 94–108.
2018 article
Editorial for the Special Issue on In-Memory Computing
Shen, X., Lovas, R., & Liao, X. (2018, October). JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, Vol. 120, pp. 322–322.
2018 journal article
Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights
PROCEEDINGS OF THE VLDB ENDOWMENT, 11(11), 1522–1535.
2018 conference paper
Exploring Flexible Communications for Streamlining DNN Ensemble Training Pipelines
SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. Presented at the SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
Event: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis
2018 article
FALCON: A Fast Drop-In Replacement of Citation KNN for Multiple Instance Learning
CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, pp. 67–76.
2018 article
Footprint Modeling of Cache Associativity and Granularity
PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS (MEMSYS 2018), pp. 232–242.
2018 report
Inter-Disciplinary Research Challenges in Computer Systems for the 2020s
National Science Foundation.
2018 journal article
LCD: A Fast Contrastive Divergence Based Algorithm for Restricted Boltzmann Machine
NEURAL NETWORKS, 108, 399–410.
2018 article
LEEM: Lean Elastic EM for Gaussian Mixture Model via Bounds-Based Filtering
2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), pp. 677–686.
2018 article
Overhead-Conscious Format Selection for SpMV-Based Applications
2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), pp. 950–959.
2018 journal article
Resolving the GPU responsiveness dilemma through program transformations
Frontiers of Computer Science, 12(3), 545–559.
2018 article
Rethinking Compilers in the Rise of Machine Learning and AI
CC'18: PROCEEDINGS OF THE 27TH INTERNATIONAL CONFERENCE ON COMPILER CONSTRUCTION, pp. 1–1.
2018 article
Reuse-Centric K-Means Configuration
2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), pp. 1224–1227.
2018 article
Taming the "Monster": Overcoming Program Optimization Challenges on SW26010 Through Precise Performance Modeling
2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), pp. 763–773.
2018 conference paper
Zwift
Proceedings of the 2018 International Conference on Supercomputing - ICS '18. Presented at the the 2018 International Conference.
Event: the 2018 International Conference
2017 conference paper
An infrastructure for HPC knowledge sharing and reuse
ACM SIGPLAN Notices, 52(8), 461–462.
2017 conference paper
Bridging the gap between memory performance and massive parallelism: The critical role of programming systems innovations (keynote)
ACM SIGPLAN Notices, 52(9), 1–1.
2017 article
Co-Run Scheduling with Power Cap on Integrated CPU-GPU Systems
2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), pp. 967–977.
2017 chapter
Data placement on GPUs
In Advances in GPU Research and Practice (pp. 105–123).
2017 conference paper
EffiSha: A software framework for enabling efficient preemptive scheduling of GPU
ACM SIGPLAN Notices, 52(8), 3–16.
2017 conference paper
Efficient support of position independence on non-volatile memory
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-50 '17. Presented at the the 50th Annual IEEE/ACM International Symposium.
Event: the 50th Annual IEEE/ACM International Symposium
2017 conference paper
Egeria
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17. Presented at the the International Conference for High Performance Computing, Networking, Storage and Analysis.
Event: the International Conference for High Performance Computing, Networking, Storage and Analysis
2017 journal article
GLORE: generalized loop redundancy elimination upon LER-notation
Proceedings of the ACM on Programming Languages, 1(OOPSLA), 1–28.
2017 conference paper
Generalizations of the theory and deployment of triangular inequality for compiler-based strength reduction
ACM SIGPLAN Notices, 52(6), 33–48.
2017 article
LCD: A Fast Contrastive Divergence Based Algorithm for Restricted Boltzmann Machine
2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), pp. 1015–1020.
2017 chapter
Software-level task scheduling on GPUs
In Advances in GPU Research and Practice (pp. 83–103).
2017 article
Sweet KNN: An Efficient KNN on GPU through Reconciliation between Redundancy Removal and Regularity
2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), pp. 621–632.
2017 conference paper
Versapipe
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-50 '17. Presented at the the 50th Annual IEEE/ACM International Symposium.
Event: the 50th Annual IEEE/ACM International Symposium
2016 report
A Software Framework for Efficient Preemptive Scheduling on GPU
(Technical Report No. TR-2016-1). North Carolina State University.
2016 conference paper
Coherence-Free Multiview
Proceedings of the 2016 International Conference on Supercomputing - ICS '16. Presented at the the 2016 International Conference.
Event: the 2016 International Conference
2016 journal article
Examining and Reducing the Influence of Sampling Errors on Feedback-Driven Optimizations
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 13(1).
2016 report
LCD: A Fast Contrastive Divergence Based Training Algorithm for Restricted Boltzmann Machine”
(No. TR-2016-3). Raleigh, NC: North Carolina State University.
2016 book
Languages and Compilers for Parallel Computing
In Lecture Notes in Computer Science.
Ed(s): , F. Mueller n & J. Tuck n n
2016 article
OpenCL-based erasure coding on heterogeneous architectures
Chen, G., Zhou, H., Shen, X., Gahm, J., Venkat, N., Booth, S., & Marshall, J. (2016, July). 2016 IEEE 27th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), Vol. 7, pp. 33–40.
2016 journal article
Optimizing Data Placement on GPU Memory: A Portable Approach
IEEE Transactions on Computers, 66(3), 473–487.
2016 conference paper
Towards Ontology-Based Program Analysis
In S. Krishnamurthi & B. S. Lerner (Eds.), 30th European Conference on Object-Oriented Programming (ECOOP 2016) (pp. 26:1–26:25). Dagstuhl, Germany: Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik.
Ed(s): S. Krishnamurthi & B. Lerner
2016 report
Towards Ontology-Based Program Analysis
(Technical Report No. TR-2016-5). North Carolina State University.
2016 journal article
Tuning for software analytics: Is it really necessary?
Information and Software Technology, 76, 135–146.
2016 journal article
Understanding co-run performance on CPU-GPU integrated processors: observations, insights, directions
Frontiers of Computer Science, 11(1), 130–146.
2015 article
Autotuning Algorithmic Choice for Input Sensitivity
Ding, Y., Ansel, J., Veeramachaneni, K., Shen, X., O'Reilly, U.-M., & Amarasinghe, S. (2015, June). ACM SIGPLAN NOTICES, Vol. 50, pp. 379–390.
2015 journal article
Enabling Portable Optimizations of Data Placement on GPU
IEEE Micro, 35(4), 16–24.
2015 conference paper
Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations
Proceedings of the 29th ACM on International Conference on Supercomputing - ICS '15. Presented at the the 29th ACM.
Event: the 29th ACM
2015 conference paper
Free launch
Proceedings of the 48th International Symposium on Microarchitecture - MICRO-48. Presented at the the 48th International Symposium.
Event: the 48th International Symposium
2015 article
On-the-Fly Principled Speculation for FSM Parallelization
Zhao, Z., & Shen, X. (2015, April). ACM SIGPLAN NOTICES, Vol. 50, pp. 619–630.
2015 conference paper
TOP: A Framework for Enabling Algorithmic Optimizations for Distance-Related Problems
In C. Li & V. Markl (Eds.), 41st International Conference on Very Large Data Bases (VLDB 2015) : proceedings of the VLDB Endowment, volume 8, number 1-13, Kohala Coast, Hawaii, USA, 31 August-4 September 2015. Stanford, CA: VLDB Endowment.
Ed(s): C. Li & V. Markl
Event: 41st International Conference on Very Large Data Bases at Kohala Coast, Hawaii
2015 report
TOP: A Framework for Enabling Algorithmic Optimizations for Distance-Related Problems”
(Technical Report No. TR-2015-3). North Carolina State University.
2015 chapter
Understanding Co-run Degradations on Integrated Heterogeneous Processors
In Languages and Compilers for Parallel Computing (pp. 82–97).
2015 conference paper
Understanding co-run degradations on integrated heterogeneous processors
Languages and compilers for parallel computing (lcpc 2014), 8967, 82–97.
2015 conference paper
Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup
Proceedings of the 32nd International Conference on Machine Learning, 37, 579–587. Lille, France.
Event: The 32nd International Conference on Machine Learning at Lille, France on July 6-11, 2015
2015 report
Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup
(Technical Report No. TR-2015-2). North Carolina State University.
2014 article
Call Sequence Prediction through Probabilistic Calling Automata
Zhao, Z., Wu, B., Zhou, M., Ding, Y., Sun, J., Shen, X., & Wu, Y. (2014, October). ACM SIGPLAN NOTICES, Vol. 49, pp. 745–762.
2014 conference paper
Challenging the "embarrassingly sequential"
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems - ASPLOS '14. Presented at the the 19th international conference.
Event: the 19th international conference
2014 conference paper
Finding the limit
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems - ASPLOS '14. Presented at the the 19th international conference.
Event: the 19th international conference
2014 conference paper
Localization of concurrency bugs using shared memory access pairs
Proceedings of the 29th ACM/IEEE international conference on Automated software engineering - ASE '14. Presented at the the 29th ACM/IEEE international conference.
Event: the 29th ACM/IEEE international conference
2014 article
PORPLE: An Extensible Optimizer for Portable Data Placement on GPU
2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), pp. 88–100.
2014 conference paper
SatScore
Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing - UbiComp '14 Adjunct. Presented at the the 2014 ACM International Joint Conference.
Event: the 2014 ACM International Joint Conference
2014 journal article
Space-efficient multi-versioning for input-adaptive feedback-driven program optimizations
ACM SIGPLAN Notices, 49(10), 763–776.
2013 journal article
Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU
ACM SIGPLAN Notices, 48(8), 57.
2013 conference paper
Exploring Hybrid Memory for GPU Energy Efficiency through Software-Hardware Co-Design
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. Presented at the PACT, Edinburgh, Scotland.
Event: PACT at Edinburgh, Scotland on September 7-11, 2013
2013 chapter
Fine-Grained Treatment to Synchronizations in GPU-to-CPU Translation
In Languages and Compilers for Parallel Computing (pp. 171–184).
2013 journal article
HPar
ACM Transactions on Architecture and Code Optimization, 10(4), 1–25.
2013 chapter
Optimal Co-Scheduling to Minimize Makespan on Chip Multiprocessors
In Job Scheduling Strategies for Parallel Processing (pp. 114–133).
2013 article
Profmig: A framework for flexible migration of program profiles across software versions
Zhou, M., Wu, B., Ding, Y., & Shen, X. (2013, February). Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
2013 chapter
Simple Profile Rectifications Go a Long Way
In ECOOP 2013 – Object-Oriented Programming (pp. 654–678).
2012 journal article
An Infrastructure for Tackling Input-Sensitivity of GPU Program Optimizations
International Journal of Parallel Programming, 41(6), 855–869.
2012 conference paper
Exploiting inter-sequence correlations for program behavior prediction
Proceedings of the ACM international conference on Object oriented programming systems languages and applications - OOPSLA '12. Presented at the the ACM international conference.
Event: the ACM international conference
2012 conference paper
One stone two birds
Proceedings of the 26th ACM international conference on Supercomputing - ICS '12. Presented at the the 26th ACM international conference.
Event: the 26th ACM international conference
2011 conference paper
A step towards transparent integration of input-consciousness into dynamic program optimizations
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications - OOPSLA '11. Presented at the the 2011 ACM international conference.
Event: the 2011 ACM international conference
2011 chapter
Array Regrouping on CMP with Non-uniform Cache Sharing
In Languages and Compilers for Parallel Computing (pp. 92–105).
2011 conference paper
Correctly Treating Synchronizations in Compiling Fine-Grained SPMD-Threaded Programs for CPU
2011 International Conference on Parallel Architectures and Compilation Techniques. Presented at the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT).
Event: 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)
2011 conference paper
Enhancing Data Locality for Dynamic Simulations through Asynchronous Data Transformations and Adaptive Control
2011 International Conference on Parallel Architectures and Compilation Techniques. Presented at the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT).
Event: 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)
2011 conference paper
On-the-fly elimination of dynamic irregularities for GPU computing
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems - ASPLOS '11. Presented at the the sixteenth international conference.
Event: the sixteenth international conference
2011 journal article
The Significance of CMP Cache Sharing on Contemporary Multithreaded Applications
IEEE Transactions on Parallel and Distributed Systems, 23(2), 367–374.
2010 conference paper
An input-centric paradigm for program dynamic optimizations
Proceedings of the ACM international conference on Object oriented programming systems languages and applications - OOPSLA '10. Presented at the the ACM international conference.
Event: the ACM international conference
2010 chapter
Combining Locality Analysis with Online Proactive Job Co-scheduling in Chip Multiprocessors
In High Performance Embedded Architectures and Compilers (pp. 201–215).
2010 conference paper
Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?
Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '10. Presented at the the 15th ACM SIGPLAN symposium.
Event: the 15th ACM SIGPLAN symposium
2010 report
Experiences in Porting the Hubbard Model in Computational Materials Science to GPU
(Technical Report No. WM-CS-2010-04). Computer Science Department, The College of William and Mary.
2010 conference paper
Exploiting statistical correlations for proactive prediction of program behaviors
Proceedings of the 8th annual IEEE/ ACM international symposium on Code generation and optimization - CGO '10. Presented at the the 8th annual IEEE/ ACM international symposium.
Event: the 8th annual IEEE/ ACM international symposium
2010 report
Implementing the Dslash Operator in OpenCL
(Technical Report No. WM-CS-2010-03). Computer Science Department, The College of William and Mary.
2010 chapter
Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors?
In Lecture Notes in Computer Science (pp. 264–282).
2010 chapter
LU Decomposition on Cell Broadband Engine: An Empirical Study to Exploit Heterogeneous Chip Multiprocessors
In Lecture Notes in Computer Science (pp. 61–75).
2010 conference paper
Streamlining GPU applications on the fly
Proceedings of the 24th ACM International Conference on Supercomputing - ICS '10. Presented at the the 24th ACM International Conference.
Event: the 24th ACM International Conference
2010 journal article
The Complexity of Optimal Job Co-Scheduling on Chip Multiprocessors and Heuristics-Based Solutions
IEEE Transactions on Parallel and Distributed Systems, 22(7), 1192–1205.
2009 report
A Systematic Measurement of the Influence of Non-Uniform Cache Sharing on the Performance of Modern Multithreaded Programs
(Technical Report No. WM-CS-2009-04). Computer Science Department, The College of William and Mary.
2009 conference paper
A cross-input adaptive framework for GPU program optimizations
2009 IEEE International Symposium on Parallel & Distributed Processing. Presented at the Distributed Processing (IPDPS).
Event: Distributed Processing (IPDPS)
2009 conference paper
A study on optimally co-scheduling jobs of different lengths on chip multiprocessors
Proceedings of the 6th ACM conference on Computing frontiers - CF '09. Presented at the the 6th ACM conference.
Event: the 6th ACM conference
2009 report
Co-Run Locality Prediction for Proactive Shared-Cache Management
(Technical Report No. WM-CS-2009-03). Computer Science Department, The College of William and Mary.
2009 conference paper
Cross-Input Learning and Discriminative Prediction in Evolvable Virtual Machines
2009 International Symposium on Code Generation and Optimization. Presented at the 2009 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
Event: 2009 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
2009 conference paper
Influence of program inputs on the selection of garbage collectors
Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments - VEE '09. Presented at the the 2009 ACM SIGPLAN/SIGOPS international conference.
Event: the 2009 ACM SIGPLAN/SIGOPS international conference
2009 report
Program Seminal Behaviors: Automating Input Characterization for Large-Scope Proactive Behavior Prediction
(Technical Report No. WM-CS-2009-07). Computer Science Department, The College of William and Mary.
2009 journal article
Program locality analysis using reuse distance
ACM Transactions on Programming Languages and Systems, 31(6), 1–39.
2009 report
Speculation with Little Wasting: Saving Cost in Software Speculation Through Transparent Learning
(No. WM-CS-2009-08). Williamsburg, VA: Computer Science Department, The College of William and Mary.
2009 conference paper
Speculation with Little Wasting: Saving Cost in Software Speculation through Transparent Learning
2009 15th International Conference on Parallel and Distributed Systems. Presented at the 2009 15th International Conference on Parallel and Distributed Systems.
Event: 2009 15th International Conference on Parallel and Distributed Systems
2009 report
Streamlining GPU Applications On the Fly – Thread Divergence Elimination through Runtime Thread-Data Remapping
(No. WM-CS-2009-08). Williamsburg, VA: Computer Science Department, The College of William and Mary.
2009 journal article
The study and handling of program inputs in the selection of garbage collectors
ACM SIGOPS Operating Systems Review, 43(3), 48.
2008 report
A Cross-Input Adaptive Framework for GPU Program Optimization
(No. WM-CS-2008-09). Williamsburg, VA: Computer Science Department, The College of William and Mary.
2008 conference paper
Adaptive Software Speculation for Enhancing the Cost-Efficiency of Behavior-Oriented Parallelization
2008 37th International Conference on Parallel Processing. Presented at the 2008 37th International Conference on Parallel Processing (ICPP).
Event: 2008 37th International Conference on Parallel Processing (ICPP)
2008 article
Adaptive speculation in behavior-oriented parallelization
Jiang, Y., & Shen, X. (2008, April). 2008 IEEE International Symposium on Parallel and Distributed Processing.
2008 conference paper
Analysis and approximation of optimal co-scheduling on chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08. Presented at the the 17th international conference.
Event: the 17th international conference
2008 report
Cross-Input Learning and Discriminative Prediction in Evolvable Virtual Machines
(No. WM-CS-2008-06). Williamsburg, VA: Computer Science Department, The College of William and Mary.
2008 chapter
Exploration of the Influence of Program Inputs on CMP Co-scheduling
In Lecture Notes in Computer Science (pp. 263–273).
2008 report
LU Decomposition on Cell Broadband Engine
(Technical Report No. WM-CS-2008-08). Computer Science Department, The College of William and Mary.
2008 chapter
Scalable Implementation of Efficient Locality Approximation
In Languages and Compilers for Parallel Computing (pp. 202–216).
2007 report
A Hybrid Framework Bridging Locality Analysis and Cache-Aware Scheduling for CMPs
(Technical Report No. WM-CS-2007-01). Computer Science Dept., The College of William and Mary.
2007 report
CAPS: Contention-Aware Proactive Scheduling for CMPs
(Technical Report No. WM-CS-2007-09). Computer Science Department, The College of William and Mary.
2007 conference paper
Locality approximation using time
Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '07. Presented at the the 34th annual ACM SIGPLAN-SIGACT symposium.
Event: the 34th annual ACM SIGPLAN-SIGACT symposium
2007 journal article
Miss Rate Prediction Across Program Inputs and Cache Configurations
IEEE Transactions on Computers, 56(3), 328–343.
2007 report
Modeling Relations Between Inputs and Dynamic Behavior for General Programs
(No. WM-CS-2007-07). Williamsburg, VA: Computer Science Department, The College of William and Mary.
2007 journal article
Predicting locality phases for dynamic memory optimization
Journal of Parallel and Distributed Computing, 67(7), 783–796.
2007 conference paper
Software behavior oriented parallelization
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation - PLDI '07. Presented at the the 2007 ACM SIGPLAN conference.
Event: the 2007 ACM SIGPLAN conference
2007 report
Study of the Effects of Program Inputs on Co-Scheduling
(Technical Report No. WM-CS-2007-13). Computer Science Department, The College of William and Mary.
2006 report
A Key-Based Adaptive Transactional Memory Executor
(No. TR909). Rochester, NY: Computer Science Dept., University of Rochester.
2006 report
Accurate Approximation of Locality from Time Distance Histograms
(Technical Report No. TR902). Computer Science Dept., University of Rochester.
2006 report
Behavior-Oriented Parallelization
(Technical Report No. TR904). Computer Science Dept., University of Rochester.
2006 report
Locality Approximation Using Time
(Technical Report No. TR901). Computer Science Dept., University of Rochester.
2006 conference paper
Program-level adaptive memory management
Proceedings of the 2006 international symposium on Memory management - ISMM '06. Presented at the the 2006 international symposium.
Event: the 2006 international symposium
2006 report
Waste Not, Want Not: Adaptive Garbage Collection in a Shared Environment
(Technical Report No. TR908). Computer Science Dept., University of Rochester.
2005 conference paper
Gated memory control for memory monitoring, leak detection and garbage collection
Proceedings of the 2005 workshop on Memory system performance - MSP '05. Presented at the the 2005 workshop.
Event: the 2005 workshop
2005 conference paper
Lightweight reference affinity analysis
Proceedings of the 19th annual international conference on Supercomputing - ICS '05. Presented at the the 19th annual international conference.
Event: the 19th annual international conference
2005 report
Parallelization of Utility Programs Based on Behavior Phase Analysis
(No. TR876). Rochester, NY: Computer Science Dept., University of Rochester.
2005 chapter
Phase-Based Miss Rate Prediction Across Program Inputs
In Lecture Notes in Computer Science (pp. 42–55).
2004 chapter
A Hierarchical Model of Reference Affinity
In Languages and Compilers for Parallel Computing (pp. 48–63).
2004 conference paper
Adaptive data partition for sorting using probability distribution
International Conference on Parallel Processing, 2004. ICPP 2004. Presented at the International Conference on Parallel Processing, 2004. ICPP 2004.
Event: International Conference on Parallel Processing, 2004. ICPP 2004.
2004 conference paper
Array regrouping and structure splitting using whole-program reference affinity
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation - PLDI '04, 255.
Event: the ACM SIGPLAN 2004 conference
2004 report
Characterizing Phases in Service-Oriented Applications
(Technical Report No. TR848). Computer Science Dept., University of Rochester.
2004 journal article
Learning multi-label scene classification
Pattern Recognition, 37(9), 1757–1771.
2004 conference paper
Locality phase prediction
Proceedings of the 11th international conference on Architectural support for programming languages and operating systems - ASPLOS-XI. Presented at the the 11th international conference.
Event: the 11th international conference
2003 report
Adaptive Data Partitioning using Probability Distribution
(Technical Report No. TR823). Computer Science Dept., University of Rochester.
2003 conference paper
Multi-label Machine Learning and Its Application to Semantic Scene Classification
Proceedings of Storage and Retrieval Methods and Applications for Multimedia 2004, 5307, 188–199.
Event: IS&T/SPIE’s Sixteenth Annual Symposium on Electronic Imaging at San Jose, CA
2003 report
Multi-label Semantic Scene Classification
(Technical Report No. TR813). Dept. of Computer Science, University of Rochester.
2003 report
Predicting Hierarchical Phases in Program Data Behavior
(Technical Report No. TR824). Computer Science Dept., University of Rochester.
2003 conference paper
Regression-Based Multi-Model Prediction of Data Reuse Signature
Proceedings of the Fourth Annual Symposium of the Los Alamos Computer Science Institute, 243–251. Sante Fe, New Mexico, USA: Alamos Computer Science Institute.
Event: Symposium of the Los Alamos Computer Science Institute at Santa Fe, NM
2002 report
The Medication Advisor Project: Preliminary Report
(Technical Report No. 776). Dept. of Computer Science, University of Rochester.
2001 conference paper
Study and Auto-Detection of Stress Based on Tonal Pitch Range in Mandarin
Proceedings of Seventh European Conference on Speech Communication and Technology, 123–126. Aalborg, Denmark.
Event: Conference on Speech Communication and Technology at Aalborg, Denmark
2001 conference paper
The Study Of The Effect Of Training Set On Statistical Language Modeling
Proceedings of Seventh European Conference on Speech Communication and Technology, 721–724. Aalborg, Denmark.
Event: Conference on Speech Communication and Technology at Aalborg, Denmark
2000 conference paper
A CART-Based Hierarchical Stochastic Model for Prosodic Phrasing in Chinese
Proceedings of International Symposium on Chinese Spoken Language Processing 2000, 105–108. Beijing, China.
Event: International Symposium on Chinese Spoken Language Processing at Beijing, China
Citation Index includes data from a number of different sources. If you have questions about the sources of data in the Citation Index or need a set of data which is free to re-distribute, please contact us.
Certain data included herein are derived from the Web of Science© and InCites© (2024) of Clarivate Analytics. All rights reserved. You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.