Xipeng Shen Huang, K., Zhai, J., Zheng, L., Wang, H., Jin, Y., Zhang, Q., … Shen, X. (2024, April 22). WiseGraph. https://doi.org/10.1145/3627703.3650063 Chen, J.-A., Sung, H.-H., Shen, X., Tallent, N., Barker, K., & Li, A. (2023). Accelerating matrix-centric graph processing on GPUs through bit-level optimizations. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 177, 53–67. https://doi.org/10.1016/j.jpdc.2023.02.013 Zhang, G., Mariano, B., Shen, X., & Dillig, I. (2023). Automated Translation of Functional Big Data similar to eries to SQL. PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 7(OOPSLA). https://doi.org/10.1145/3586047 Chen, J.-A., Sung, H.-H., Shen, X., Choudhury, S., & Li, A. (2023). BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs. PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2023, pp. 264–276. https://doi.org/10.1145/3577193.3593725 Chen, Z., Zhang, F., Guan, J. W., Zhai, J., Shen, X., Zhang, H., … Du, X. (2023). CompressGraph: Efficient Parallel Graph Analytics with Rule-Based Compression. Proceedings of the ACM on Management of Data. https://doi.org/10.1145/3588684 Zhang, F., Wu, R., Guan, J., Zheng, Z., Guo, X., Zhang, X., … Shen, X. (2023). Expanding the Edge: Enabling Efficient Winograd CNN Inference With Deep Reuse on Edge Device. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 35(10), 10181–10196. https://doi.org/10.1109/TKDE.2023.3269017 Ye, C., Xu, Y., Shen, X., Sha, Y., Liao, X., Jin, H., & Solihin, Y. (2023). Reconciling Selective Logging and Hardware Persistent Memory Transaction. 2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, pp. 664–676. https://doi.org/10.1109/HPCA56546.2023.10071088 Ye, C., Xu, Y., Shen, X., Sha, Y., Liao, X., Jin, H., & Solihin, Y. (2023). SpecPMT: Speculative Logging for Resolving Crash Consistency Overhead of Persistent Memory. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, VOL 2, ASPLOS 2023, pp. 762–777. https://doi.org/10.1145/3575693.3575696 Chen, J.-A., Niu, W., Ren, B., Wang, Y., & Shen, X. (2023). Survey: Exploiting Data Redundancy for Optimization of Deep Learning. ACM COMPUTING SURVEYS, 55(10). https://doi.org/10.1145/3564663 Chen, J.-A., Sung, H.-H., Shen, X., Tallent, N., Barker, K., & Li, A. (2022). Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU. 2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022), pp. 515–525. https://doi.org/10.1109/IPDPS53621.2022.00056 Sung, H.-H., Xu, Y., Guan, J., Niu, W., Ren, B., Wang, Y., … Shen, X. (2022). Brief Industry Paper: Enabling Level-4 Autonomous Driving on a Single $1k Off-the-Shelf Card. 2022 IEEE 28TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS), pp. 297–300. https://doi.org/10.1109/RTAS54340.2022.00032 Wu, R., Zhang, F., Guan, J., Zheng, Z., Du, X., & Shen, X. (2022). DREW: Efficient Winograd CNN Inference with Deep Reuse. PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), pp. 1807–1816. https://doi.org/10.1145/3485447.3511985 Cicek, N. M., Shen, X., & Ozturk, O. (2022). Energy Efficient Boosting of GEMM Accelerators for DNN via Reuse. ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 27(5). https://doi.org/10.1145/3503469 Pan, Z., Zhang, F., Zhou, Y., Zhai, J., Shen, X., Mutlu, O., & Du, X. (2022). Exploring Data Analytics Without Decompression on Embedded GPU Systems. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 33(7), 1553–1568. https://doi.org/10.1109/TPDS.2021.3119402 Xu, Y., Ye, C., Solihin, Y., & Shen, X. (2022). FFCCD: Fence-Free Crash-Consistent Concurrent Defragmentation for Persistent Memory. PROCEEDINGS OF THE 2022 THE 49TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '22), pp. 274–288. https://doi.org/10.1145/3470496.3527406 Niu, W., Guan, J., Shen, X., Wang, Y., Agrawal, G., & Ren, B. (2022). GCD(2) : A Globally Optimizing Compiler for Mapping DNNs to Mobile DSPs. 2022 55TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), pp. 512–529. https://doi.org/10.1109/MICRO56248.2022.00044 Cicek, N. M., Ning, L., Ozturk, O., & Shen, X. (2022). General Reuse-Centric CNN Accelerator. IEEE TRANSACTIONS ON COMPUTERS, 71(4), 880–891. https://doi.org/10.1109/TC.2021.3064608 Young, M., Nan, Z., & Shen, X. (2022). IDE Augmented with Human-Learning Inspired Natural Language Programming. 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2022), pp. 110–114. https://doi.org/10.1145/3510454.3516832 Nan, Z., Dave, M., Shen, X., Liao, C., Vanderbruggen, T., Lin, P.-H., & Emani, M. (2022). Interactive NLU-Powered Ontology-Based Workflow Synthesis for FAIR Support of HPC. 2022 IEEE/ACM INTERNATIONAL WORKSHOP ON HPC USER SUPPORT TOOLS (HUST), pp. 29–40. https://doi.org/10.1109/HUST56722.2022.00009 Zhang, F., Zhai, J., Shen, X., Mutlu, O., & Du, X. (2022). POCLib: A High-Performance Framework for Enabling Near Orthogonal Processing on Compression. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 33(2), 459–475. https://doi.org/10.1109/TPDS.2021.3093234 Ye, C., Xu, Y., Shen, X., Jin, H., Liao, X., & Solihin, Y. (2022). Preserving Addressability Upon GC-Triggered Data Movements on Non-Volatile Memory. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 19(2). https://doi.org/10.1145/3511706 Xia, T., Shu, R., Shen, X., & Menzies, T. (2022). Sequential Model Optimization for Software Effort Estimation. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 48(6), 1994–2009. https://doi.org/10.1109/TSE.2020.3047072 Xu, Y., Ye, C., Shen, X., & Solihin, Y. (2022). Temporal Exposure Reduction Protection for Persistent Memory. 2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022), pp. 908–924. https://doi.org/10.1109/HPCA53966.2022.00071 Sun, X., Xie, L., Shah, S. U., & Shen, X. (2021). A Machine Learning Based Ensemble Forecasting Optimization Algorithm for Preseason Prediction of Atlantic Hurricane Activity. ATMOSPHERE, 12(4). https://doi.org/10.3390/atmos12040522 Guan, H., Shen, X., & Krim, H. (2021). An Automatic Synthesizer of Advising Tools for High Performance Computing. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 32(2), 330–341. https://doi.org/10.1109/TPDS.2020.3018636 Zhao, P., Niu, W., Yuan, G., Cai, Y., Sung, H.-H., Liu, S., … Lin, X. (2021). Brief Industry Paper: Towards Real-Time 3D Object Detection for Autonomous Vehicles with Pruning Search. 2021 IEEE 27TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS 2021), pp. 425–428. https://doi.org/10.1109/RTAS52030.2021.00043 Guan, H., Liu, S., Ma, X., Niu, W., Ren, B., Shen, X., … Zhao, P. (2021). CoCoPIE: Enabling Real-Time AI on Off-the-Shelf Mobile Devices via Compression-Compilation Co-Design. COMMUNICATIONS OF THE ACM, 64(6), 62–68. https://doi.org/10.1145/3418297 Shen, X., Zhang, G., Dea, I., Andow, S., Arroyo-Fang, E., Gafter, N., … Yang, S. (2021). Coarsening Optimization for Differentiable Programming. PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 5(OOPSLA). https://doi.org/10.1145/3485507 Zhang, F., Pan, Z., Zhou, Y., Zhai, J., Shen, X., Mutlu, O., & Du, X. (2021). G-TADOC: Enabling Efficient GPU-Based Text Analytics without Decompression. 2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), pp. 1679–1690. https://doi.org/10.1109/ICDE51399.2021.00148 Liao, C., Lin, P.-H., Verma, G., Vanderbruggen, T., Emani, M., Nan, Z., & Shen, X. (2021). HPC Ontology: Towards a Unified Ontology for Managing Training Datasets and AI Models for High-Performance Computing. PROCEEDINGS OF THE WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC 2021), pp. 69–80. https://doi.org/10.1109/MLHPC54614.2021.00012 Verma, G., Emani, M., Liao, C., Lin, P.-H., Vanderbruggen, T., Shen, X., & Chapman, B. (2021). HPCFAIR: Enabling FAIR AI for HPC Applications. PROCEEDINGS OF THE WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC 2021), pp. 58–68. https://doi.org/10.1109/MLHPC54614.2021.00011 Ye, C., Xu, Y., Shen, X., Liao, X., Jin, H., & Solihin, Y. (2021). Hardware-Based Address-Centric Acceleration of Key-Value Store. 2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), pp. 736–748. https://doi.org/10.1109/HPCA51647.2021.00067 Guan, H., Chaudhary, U., Xu, Y., Ning, L., Zhang, L., & Shen, X. (2021). Recurrent Neural Networks Meet Context-Free Grammar: Two Birds with One Stone. 2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), pp. 1078–1083. https://doi.org/10.1109/ICDM51629.2021.00125 Zhang, L., Guan, H., Ding, Y., Shen, X., & Krim, H. (2021). Reuse-centric k-means configuration. INFORMATION SYSTEMS, 100. https://doi.org/10.1016/j.is.2021.101787 Yang, S., Shen, X., & Lim, S.-H. (2021). Revisit the Scalability of Deep Auto-Regressive Models for Graph Generation. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN). https://doi.org/10.1109/IJCNN52387.2021.9534206 Ul Mustafa, N., Xu, Y., Shen, X., & Solihin, Y. (2021). Seeds of SEED: New Security Challenges for Persistent Memory. 2021 INTERNATIONAL SYMPOSIUM ON SECURE AND PRIVATE EXECUTION ENVIRONMENT DESIGN (SEED 2021), pp. 83–88. https://doi.org/10.1109/SEED51797.2021.00020 Agrawal, A., Yang, X., Agrawal, R., Yedida, R., Shen, X., & Menzies, T. (2021). Simpler Hyperparameter Optimization for Software Analytics: Why, How, When. IEEE Transactions on Software Engineering, 48(8), 1–1. https://doi.org/10.1109/TSE.2021.3073242 Ye, C., Xu, Y., Shen, X., Liao, X., Jin, H., & Solihin, Y. (2021). Supporting Legacy Libraries on Non-Volatile Memory: A User-Transparent Approach. 2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), pp. 443–455. https://doi.org/10.1109/ISCA52012.2021.00042 Zhang, F., Zhai, J., Shen, X., Wang, D., Chen, Z., Mutlu, O., … Du, X. (2021). TADOC: Text analytics directly on compression. VLDB JOURNAL, 30(2), 163–188. https://doi.org/10.1007/s00778-020-00636-3 Tan, J., Chen, Y., Liu, Z., Ren, B., Song, S. L., Shen, X., & Liu, X. (2021). Toward Efficient Interactions between Python and Native Libraries. PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21), pp. 1117–1128. https://doi.org/10.1145/3468264.3468541 Zhang, G., Xu, Y., Shen, X., & Dillig, I. (2021). UDF to SQL Translation through Compositional Lazy Inductive Synthesis. PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 5(OOPSLA). https://doi.org/10.1145/3485489 Li, X., Zhang, L., & Shen, X. (2020). DIAC An Inter-app Conflicts Detector for Open IoT Systems. ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 19(6). https://doi.org/10.1145/3391895 Zhang, F., Zhai, J., Shen, X., Mutlu, O., & Du, X. (2020). Enabling Efficient Random Access to Hierarchically-Compressed Data. 2020 IEEE 36th International Conference on Data Engineering (ICDE), 1069–1080. https://doi.org/10.1109/ICDE48307.2020.00097 Zhou, W., Zhao, Y., Shen, X., & Chen, W. (2020). Enabling Runtime SpMV Format Selection through an Overhead Conscious Method. IEEE Transactions on Parallel and Distributed Systems, 31(1), 80–93. https://doi.org/10.1109/TPDS.2019.2932931 Oh, C., Zheng, Z., Shen, X., Zhai, J., & Yi, Y. (2020). GOPipe: A Granularity-Oblivious Programming Framework for Pipelined Stencil Executions on GPU. PACT '20: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, pp. 43–54. https://doi.org/10.1145/3410463.3414656 Zhou, W., Zhao, Y., Zhang, G., & Shen, X. (2020). HARP: Holistic Analysis for Refactoring Python-Based Analytics Programs. 2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020), pp. 506–517. https://doi.org/10.1145/3377811.3380434 Xu, Y., Ye, C. C., Solihin, Y., & Shen, X. (2020). Hardware-Based Domain Virtualization for Intra-Process Isolation of Persistent Memory Objects. 2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2020), pp. 680–692. https://doi.org/10.1109/ISCA45697.2020.00062 Xu, Y., Solihin, Y., & Shen, X. (2020). MERR: Improving Security of Persistent Memory Objects via Efficient Memory Exposure Reduction and Randomization. TWENTY-FIFTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXV), pp. 987–1000. https://doi.org/10.1145/3373376.3378492 Jin, H., Shen, X., Lovas, R., & Liao, X. (2020, February 10). Special Issue: Graph Computing. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, Vol. 32. https://doi.org/10.1002/cpe.5452 Ning, L., Guan, H., & Shen, X. (2019). Adaptive Deep Reuse: Accelerating CNN Training on the Fly. 2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), pp. 1538–1549. https://doi.org/10.1109/ICDE.2019.00138 Ning, L., & Shen, X. (2019). Deep reuse. Proceedings of the ACM International Conference on Supercomputing - ICS '19. Presented at the the ACM International Conference. https://doi.org/10.1145/3330345.3330384 Zheng, Z., Oh, C., Zhai, J., Shen, X., Yi, Y., & Chen, W. (2019). HiWayLib. Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '19. Presented at the the Twenty-Fourth International Conference. https://doi.org/10.1145/3297858.3304032 Agrawal, A., Fu, W., Chen, D., Shen, X., & Menzies, T. (2019). How to "DODGE" Complex Software Analytics. IEEE Transactions on Software Engineering, 47(10), 1–1. https://doi.org/10.1109/TSE.2019.2945020 Li, X., Zhang, L., & Shen, X. (2019). IA-graph based inter-app conflicts detection in open IoT systems. Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems - LCTES 2019. Presented at the the 20th ACM SIGPLAN/SIGBED International Conference. https://doi.org/10.1145/3316482.3326350 Guan, H., Ning, L., Lin, Z., Shen, X., Zhou, H., & Lim, S.-H. (2019). In-Place Zero-Space Memory Protection for CNN. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, & R. Garnett (Eds.), Advances in Neural Information Processing Systems Proceedings. Oh, C., Zheng, Z., Shen, X., Zhai, J., & Yi, Y. (2019). POSTER: GOPipe: A Granularity-Oblivious Programming Framework for Pipelined Stencil Executions on GPU. PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19), pp. 431–432. https://doi.org/10.1145/3293883.3301494 Yang, S., Shen, X., & Chi, M. (2019). Streamline Density Peak Clustering for Practical Adoptions. Proceedings of the 28th ACM International Conference on Information and Knowledge Management - CIKM '19. Presented at the the 28th ACM International Conference. https://doi.org/10.1145/3357384.3358053 Guan, H., Shen, X., & Lim, S.-H. (2019). Wootz: a compiler-based framework for fast CNN pruning via composability. Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation - PLDI 2019. Presented at the the 40th ACM SIGPLAN Conference. https://doi.org/10.1145/3314221.3314652 Zhao, Y., Li, J., Liao, C., & Shen, X. (2018). Bridging the Gap between Deep Learning and Sparse Matrix Format Selection. ACM SIGPLAN NOTICES, 53(1), 94–108. https://doi.org/10.1145/3178487.3178495 Shen, X., Lovas, R., & Liao, X. (2018, October). Editorial for the Special Issue on In-Memory Computing. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, Vol. 120, pp. 322–322. https://doi.org/10.1016/j.jpdc.2018.05.009 Zhang, F., Zhai, J., Shen, X., Mutlu, O., & Chen, W. (2018). Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights. PROCEEDINGS OF THE VLDB ENDOWMENT, 11(11), 1522–1535. https://doi.org/10.14778/3236187.3236203 Pittman, R., Guan, H., Shen, X., Lim, S.-H., & Patton, R. M. (2018). Exploring Flexible Communications for Streamlining DNN Ensemble Training Pipelines. SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. Presented at the SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. https://doi.org/10.1109/sc.2018.00067 Yang, S., & Shen, X. (2018). FALCON: A Fast Drop-In Replacement of Citation KNN for Multiple Instance Learning. CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, pp. 67–76. https://doi.org/10.1145/3269206.3271787 Luo, H., Chen, G., Liu, F., Li, P., Ding, C., & Shen, X. (2018). Footprint Modeling of Cache Associativity and Granularity. PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS (MEMSYS 2018), pp. 232–242. https://doi.org/10.1145/3240302.3240419 Cohen, A., Shen, X., Torrellas, J., Tuck, J., & Zhou, Y. (2018). Inter-Disciplinary Research Challenges in Computer Systems for the 2020s. National Science Foundation. Ning, L., Pittman, R., & Shen, X. (2018). LCD: A Fast Contrastive Divergence Based Algorithm for Restricted Boltzmann Machine. NEURAL NETWORKS, 108, 399–410. https://doi.org/10.1016/j.neunet.2018.08.018 Yang, S., & Shen, X. (2018). LEEM: Lean Elastic EM for Gaussian Mixture Model via Bounds-Based Filtering. 2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), pp. 677–686. https://doi.org/10.1109/ICDM.2018.00083 Zhao, Y., Zhou, W., Shen, X., & Yiu, G. (2018). Overhead-Conscious Format Selection for SpMV-Based Applications. 2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), pp. 950–959. https://doi.org/10.1109/IPDPS.2018.00104 Zhu, Q., Wu, B., Shen, X., Shen, K., Shen, L., & Wang, Z. (2018). Resolving the GPU responsiveness dilemma through program transformations. Frontiers of Computer Science, 12(3), 545–559. https://doi.org/10.1007/s11704-016-6206-y Shen, X. (2018). Rethinking Compilers in the Rise of Machine Learning and AI. CC'18: PROCEEDINGS OF THE 27TH INTERNATIONAL CONFERENCE ON COMPILER CONSTRUCTION, pp. 1–1. https://doi.org/10.1145/3178372.3183634 Guan, H., Ding, Y., Shen, X., & Krim, H. (2018). Reuse-Centric K-Means Configuration. 2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), pp. 1224–1227. https://doi.org/10.1109/ICDE.2018.00116 Xu, S., Xu, Y., Xue, W., Shen, X., Zheng, F., Huang, X., & Yang, G. (2018). Taming the "Monster": Overcoming Program Optimization Challenges on SW26010 Through Precise Performance Modeling. 2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), pp. 763–773. https://doi.org/10.1109/IPDPS.2018.00086 Zhang, F., Zhai, J., Shen, X., Mutlu, O., & Chen, W. (2018). Zwift. Proceedings of the 2018 International Conference on Supercomputing - ICS '18. Presented at the the 2018 International Conference. https://doi.org/10.1145/3205289.3205325 Zhao, Y., Liao, C. H., & Shen, X. (2017). An infrastructure for HPC knowledge sharing and reuse. ACM SIGPLAN Notices, 52(8), 461–462. https://doi.org/10.1145/3155284.3019023 Shen, X. (2017). Bridging the gap between memory performance and massive parallelism: The critical role of programming systems innovations (keynote). ACM SIGPLAN Notices, 52(9), 1–1. https://doi.org/10.1145/3156685.3092569 Zhu, Q., Wo, B., Shen, X., Shen, L., & Wang, Z. (2017). Co-Run Scheduling with Power Cap on Integrated CPU-GPU Systems. 2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), pp. 967–977. https://doi.org/10.1109/ipdps.2017.124 Shen, X., & Wu, B. (2017). Data placement on GPUs. In Advances in GPU Research and Practice (pp. 105–123). https://doi.org/10.1016/b978-0-12-803738-6.00005-7 Chen, G. Y., Zhao, Y., Shen, X., & Zhou, H. Y. (2017). EffiSha: A software framework for enabling efficient preemptive scheduling of GPU. ACM SIGPLAN Notices, 52(8), 3–16. https://doi.org/10.1145/3155284.3018748 Chen, G., Zhang, L., Budhiraja, R., Shen, X., & Wu, Y. (2017). Efficient support of position independence on non-volatile memory. Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-50 '17. Presented at the the 50th Annual IEEE/ACM International Symposium. https://doi.org/10.1145/3123939.3124543 Guan, H., Shen, X., & Krim, H. (2017). Egeria. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17. Presented at the the International Conference for High Performance Computing, Networking, Storage and Analysis. https://doi.org/10.1145/3126908.3126961 Ding, Y., & Shen, X. (2017). GLORE: generalized loop redundancy elimination upon LER-notation. Proceedings of the ACM on Programming Languages, 1(OOPSLA), 1–28. https://doi.org/10.1145/3133898 Ding, Y. F., Ning, L., Guan, H., & Shen, X. (2017). Generalizations of the theory and deployment of triangular inequality for compiler-based strength reduction. ACM SIGPLAN Notices, 52(6), 33–48. https://doi.org/10.1145/3140587.3062377 Ning, L., Pittman, R., & Shen, X. (2017). LCD: A Fast Contrastive Divergence Based Algorithm for Restricted Boltzmann Machine. 2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), pp. 1015–1020. https://doi.org/10.1109/icdm.2017.131 Chen, G., Shen, X., Wu, B., & Li, D. (2017). Optimizing Data Placement on GPU Memory: A Portable Approach. IEEE Transactions on Computers, 66(3), 473–487. https://doi.org/10.1109/tc.2016.2604372 Wu, B., & Shen, X. (2017). Software-level task scheduling on GPUs. In Advances in GPU Research and Practice (pp. 83–103). https://doi.org/10.1016/b978-0-12-803738-6.00004-5 Chen, G., Ding, Y., & Shen, X. (2017). Sweet KNN: An Efficient KNN on GPU through Reconciliation between Redundancy Removal and Regularity. 2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), pp. 621–632. https://doi.org/10.1109/icde.2017.116 Zhu, Q., Wu, B., Shen, X., Shen, K., Shen, L., & Wang, Z. (2017). Understanding co-run performance on CPU-GPU integrated processors: observations, insights, directions. Frontiers of Computer Science, 11(1), 130–146. https://doi.org/10.1007/s11704-016-5468-8 Zheng, Z., Oh, C., Zhai, J., Shen, X., Yi, Y., & Chen, W. (2017). Versapipe. Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-50 '17. Presented at the the 50th Annual IEEE/ACM International Symposium. https://doi.org/10.1145/3123939.3123978 Chen, G., Shen, X., & Zhou, H. (2016). A Software Framework for Efficient Preemptive Scheduling on GPU (Technical Report No. TR-2016-1). North Carolina State University. Chen, G., & Shen, X. (2016). Coherence-Free Multiview. Proceedings of the 2016 International Conference on Supercomputing - ICS '16. Presented at the the 2016 International Conference. https://doi.org/10.1145/2925426.2926277 Zhou, M., Wu, B., Shen, X., Gao, Y., & Yiu, G. (2016). Examining and Reducing the Influence of Sampling Errors on Feedback-Driven Optimizations. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 13(1). https://doi.org/10.1145/2851502 Ning, L., & Shen, X. (2016). LCD: A Fast Contrastive Divergence Based Training Algorithm for Restricted Boltzmann Machine” (No. TR-2016-3). Raleigh, NC: North Carolina State University. Shen, X., Mueller, F., & Tuck, J. (Eds.). (2016). Languages and Compilers for Parallel Computing. In Lecture Notes in Computer Science. https://doi.org/10.1007/978-3-319-29778-1 Chen, G., Zhou, H., Shen, X., Gahm, J., Venkat, N., Booth, S., & Marshall, J. (2016, July). OpenCL-based erasure coding on heterogeneous architectures. 2016 IEEE 27th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), Vol. 7, pp. 33–40. https://doi.org/10.1109/asap.2016.7760770 Zhao, Y., Chen, G., Liao, C., & Shen, X. (2016). Towards Ontology-Based Program Analysis. In S. Krishnamurthi & B. S. Lerner (Eds.), 30th European Conference on Object-Oriented Programming (ECOOP 2016) (pp. 26:1–26:25). Dagstuhl, Germany: Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik. Zhao, Y., Liao, C., & Shen, X. (2016). Towards Ontology-Based Program Analysis (Technical Report No. TR-2016-5). North Carolina State University. Fu, W., Menzies, T., & Shen, X. (2016). Tuning for software analytics: Is it really necessary? Information and Software Technology, 76, 135–146. https://doi.org/10.1016/j.infsof.2016.04.017 Ding, Y., Ansel, J., Veeramachaneni, K., Shen, X., O'Reilly, U.-M., & Amarasinghe, S. (2015, June). Autotuning Algorithmic Choice for Input Sensitivity. ACM SIGPLAN NOTICES, Vol. 50, pp. 379–390. https://doi.org/10.1145/2813885.2737969 Chen, G., Wu, B., Li, D., & Shen, X. (2015). Enabling Portable Optimizations of Data Placement on GPU. IEEE Micro, 35(4), 16–24. https://doi.org/10.1109/mm.2015.53 Wu, B., Chen, G., Li, D., Shen, X., & Vetter, J. (2015). Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations. Proceedings of the 29th ACM on International Conference on Supercomputing - ICS '15. Presented at the the 29th ACM. https://doi.org/10.1145/2751205.2751213 Chen, G., & Shen, X. (2015). Free launch. Proceedings of the 48th International Symposium on Microarchitecture - MICRO-48. Presented at the the 48th International Symposium. https://doi.org/10.1145/2830772.2830818 Zhao, Z., & Shen, X. (2015, April). On-the-Fly Principled Speculation for FSM Parallelization. ACM SIGPLAN NOTICES, Vol. 50, pp. 619–630. https://doi.org/10.1145/2775054.2694369 Ding, Y., Shen, X., Musuvathi, M., & Mytkowicz, T. (2015). TOP: A Framework for Enabling Algorithmic Optimizations for Distance-Related Problems. In C. Li & V. Markl (Eds.), 41st International Conference on Very Large Data Bases (VLDB 2015) : proceedings of the VLDB Endowment, volume 8, number 1-13, Kohala Coast, Hawaii, USA, 31 August-4 September 2015. Stanford, CA: VLDB Endowment. Ding, Y., Shen, X., Musuvathi, M., & Mytkowicz, T. (2015). TOP: A Framework for Enabling Algorithmic Optimizations for Distance-Related Problems” (Technical Report No. TR-2015-3). North Carolina State University. Zhu, Q., Wu, B., Shen, X., Shen, L., & Wang, Z. (2015). Understanding Co-run Degradations on Integrated Heterogeneous Processors. In Languages and Compilers for Parallel Computing (pp. 82–97). https://doi.org/10.1007/978-3-319-17473-0_6 Zhu, Q., Wu, B., Shen, X. P., Shen, L., & Wang, Z. Y. (2015). Understanding co-run degradations on integrated heterogeneous processors. Languages and compilers for parallel computing (lcpc 2014), 8967, 82–97. Ding, Y., Zhao, Y., Shen, X., Musuvathi, M., & Mytkowicz, T. (2015). Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup. Proceedings of the 32nd International Conference on Machine Learning, 37, 579–587. Lille, France. Ding, Y., Shen, X., Musuvathi, M., & Mytkowicz, T. (2015). Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup (Technical Report No. TR-2015-2). North Carolina State University. Zhao, Z., Wu, B., Zhou, M., Ding, Y., Sun, J., Shen, X., & Wu, Y. (2014, October). Call Sequence Prediction through Probabilistic Calling Automata. ACM SIGPLAN NOTICES, Vol. 49, pp. 745–762. https://doi.org/10.1145/2714064.2660221 Zhao, Z., Wu, B., & Shen, X. (2014). Challenging the "embarrassingly sequential". Proceedings of the 19th international conference on Architectural support for programming languages and operating systems - ASPLOS '14. Presented at the the 19th international conference. https://doi.org/10.1145/2541940.2541989 Ding, Y., Zhou, M., Zhao, Z., Eisenstat, S., & Shen, X. (2014). Finding the limit. Proceedings of the 19th international conference on Architectural support for programming languages and operating systems - ASPLOS '14. Presented at the the 19th international conference. https://doi.org/10.1145/2541940.2541945 Wang, W., Wang, Z., Wu, C., Yew, P.-C., Shen, X., Yuan, X., … Guan, Y. (2014). Localization of concurrency bugs using shared memory access pairs. Proceedings of the 29th ACM/IEEE international conference on Automated software engineering - ASE '14. Presented at the the 29th ACM/IEEE international conference. https://doi.org/10.1145/2642937.2642972 Chen, G., Wu, B., Li, D., & Shen, X. (2014). PORPLE: An Extensible Optimizer for Portable Data Placement on GPU. 2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), pp. 88–100. https://doi.org/10.1109/micro.2014.20 Zhao, Z., Zhou, M., & Shen, X. (2014). SatScore. Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing - UbiComp '14 Adjunct. Presented at the the 2014 ACM International Joint Conference. https://doi.org/10.1145/2632048.2632080 Zhou, M. Z., Shen, X., Gao, Y. Q., & Yiu, G. (2014). Space-efficient multi-versioning for input-adaptive feedback-driven program optimizations. ACM SIGPLAN Notices, 49(10), 763–776. https://doi.org/10.1145/2714064.2660229 Wu, B., Zhao, Z., Zhang, E. Z., Jiang, Y., & Shen, X. (2013). Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU. ACM SIGPLAN Notices, 48(8), 57. https://doi.org/10.1145/2517327.2442523 Wang, B., Wu, B., Li, D., Shen, X., Yu, W., Jiao, Y., & Vetter, J. (2013). Exploring Hybrid Memory for GPU Energy Efficiency through Software-Hardware Co-Design. Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. Presented at the PACT, Edinburgh, Scotland. https://doi.org/10.1109/pact.2013.6618807 Guo, Z., & Shen, X. (2013). Fine-Grained Treatment to Synchronizations in GPU-to-CPU Translation. In Languages and Compilers for Parallel Computing (pp. 171–184). https://doi.org/10.1007/978-3-642-36036-7_12 Zhao, Z., Bebenita, M., Herman, D., Sun, J., & Shen, X. (2013). HPar. ACM Transactions on Architecture and Code Optimization, 10(4), 1–25. https://doi.org/10.1145/2541228.2555301 Tian, K., Jiang, Y., Shen, X., & Mao, W. (2013). Optimal Co-Scheduling to Minimize Makespan on Chip Multiprocessors. In Job Scheduling Strategies for Parallel Processing (pp. 114–133). https://doi.org/10.1007/978-3-642-35867-8_7 Zhou, M., Wu, B., Ding, Y., & Shen, X. (2013, February). Profmig: A framework for flexible migration of program profiles across software versions. Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). https://doi.org/10.1109/cgo.2013.6494984 Wu, B., Zhou, M., Shen, X., Gao, Y., Silvera, R., & Yiu, G. (2013). Simple Profile Rectifications Go a Long Way. In ECOOP 2013 – Object-Oriented Programming (pp. 654–678). https://doi.org/10.1007/978-3-642-39038-8_27 Shen, X., Liu, Y., Zhang, E. Z., & Bhamidipati, P. (2012). An Infrastructure for Tackling Input-Sensitivity of GPU Program Optimizations. International Journal of Parallel Programming, 41(6), 855–869. https://doi.org/10.1007/S10766-012-0236-3 Wu, B., Zhao, Z., Shen, X., Jiang, Y., Gao, Y., & Silvera, R. (2012). Exploiting inter-sequence correlations for program behavior prediction. Proceedings of the ACM international conference on Object oriented programming systems languages and applications - OOPSLA '12. Presented at the the ACM international conference. https://doi.org/10.1145/2384616.2384678 Guo, Z., Wu, B., & Shen, X. (2012). One stone two birds. Proceedings of the 26th ACM international conference on Supercomputing - ICS '12. Presented at the the 26th ACM international conference. https://doi.org/10.1145/2304576.2304583 Zhang, E. Z., Jiang, Y., & Shen, X. (2012). The Significance of CMP Cache Sharing on Contemporary Multithreaded Applications. IEEE Transactions on Parallel and Distributed Systems, 23(2), 367–374. https://doi.org/10.1109/TPDS.2011.130 Tian, K., Zhang, E., & Shen, X. (2011). A step towards transparent integration of input-consciousness into dynamic program optimizations. Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications - OOPSLA '11. Presented at the the 2011 ACM international conference. https://doi.org/10.1145/2048066.2048103 Jiang, Y., Zhang, E. Z., Shen, X., Gao, Y., & Archambault, R. (2011). Array Regrouping on CMP with Non-uniform Cache Sharing. In Languages and Compilers for Parallel Computing (pp. 92–105). https://doi.org/10.1007/978-3-642-19595-2_7 Guo, Z., Zhang, E. Z., & Shen, X. (2011). Correctly Treating Synchronizations in Compiling Fine-Grained SPMD-Threaded Programs for CPU. 2011 International Conference on Parallel Architectures and Compilation Techniques. Presented at the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT). https://doi.org/10.1109/pact.2011.62 Wu, B., Zhang, E. Z., & Shen, X. (2011). Enhancing Data Locality for Dynamic Simulations through Asynchronous Data Transformations and Adaptive Control. 2011 International Conference on Parallel Architectures and Compilation Techniques. Presented at the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT). https://doi.org/10.1109/pact.2011.56 Zhang, E. Z., Jiang, Y., Guo, Z., Tian, K., & Shen, X. (2011). On-the-fly elimination of dynamic irregularities for GPU computing. Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems - ASPLOS '11. Presented at the the sixteenth international conference. https://doi.org/10.1145/1950365.1950408 Jiang, Y., Tian, K., Shen, X., Zhang, J., Chen, J., & Tripathi, R. (2011). The Complexity of Optimal Job Co-Scheduling on Chip Multiprocessors and Heuristics-Based Solutions. IEEE Transactions on Parallel and Distributed Systems, 22(7), 1192–1205. https://doi.org/10.1109/TPDS.2010.193 Tian, K., Jiang, Y., Zhang, E. Z., & Shen, X. (2010). An input-centric paradigm for program dynamic optimizations. Proceedings of the ACM international conference on Object oriented programming systems languages and applications - OOPSLA '10. Presented at the the ACM international conference. https://doi.org/10.1145/1869459.1869471 Jiang, Y., Tian, K., & Shen, X. (2010). Combining Locality Analysis with Online Proactive Job Co-scheduling in Chip Multiprocessors. In High Performance Embedded Architectures and Compilers (pp. 201–215). https://doi.org/10.1007/978-3-642-11515-8_16 Zhang, E. Z., Jiang, Y., & Shen, X. (2010). Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '10. Presented at the the 15th ACM SIGPLAN symposium. https://doi.org/10.1145/1693453.1693482 Albert, C., Paloski, A., Shen, X., Walter, E. J., & Zhang, S. (2010). Experiences in Porting the Hubbard Model in Computational Materials Science to GPU (Technical Report No. WM-CS-2010-04). Computer Science Department, The College of William and Mary. Jiang, Y., Zhang, E. Z., Tian, K., Mao, F., Gethers, M., Shen, X., & Gao, Y. (2010). Exploiting statistical correlations for proactive prediction of program behaviors. Proceedings of the 8th annual IEEE/ ACM international symposium on Code generation and optimization - CGO '10. Presented at the the 8th annual IEEE/ ACM international symposium. https://doi.org/10.1145/1772954.1772989 Kowalski, A., & Shen, X. (2010). Implementing the Dslash Operator in OpenCL (Technical Report No. WM-CS-2010-03). Computer Science Department, The College of William and Mary. Jiang, Y., Zhang, E. Z., Tian, K., & Shen, X. (2010). Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? In Lecture Notes in Computer Science (pp. 264–282). https://doi.org/10.1007/978-3-642-11970-5_15 Mao, F., & Shen, X. (2010). LU Decomposition on Cell Broadband Engine: An Empirical Study to Exploit Heterogeneous Chip Multiprocessors. In Lecture Notes in Computer Science (pp. 61–75). https://doi.org/10.1007/978-3-642-15672-4_7 Zhang, E. Z., Jiang, Y., Guo, Z., & Shen, X. (2010). Streamlining GPU applications on the fly. Proceedings of the 24th ACM International Conference on Supercomputing - ICS '10. Presented at the the 24th ACM International Conference. https://doi.org/10.1145/1810085.1810104 Zhang, E. Z., Jiang, Y., & Shen, X. (2009). A Systematic Measurement of the Influence of Non-Uniform Cache Sharing on the Performance of Modern Multithreaded Programs (Technical Report No. WM-CS-2009-04). Computer Science Department, The College of William and Mary. Liu, Y., Zhang, E. Z., & Shen, X. (2009). A cross-input adaptive framework for GPU program optimizations. 2009 IEEE International Symposium on Parallel & Distributed Processing. Presented at the Distributed Processing (IPDPS). https://doi.org/10.1109/ipdps.2009.5160988 Tian, K., Jiang, Y., & Shen, X. (2009). A study on optimally co-scheduling jobs of different lengths on chip multiprocessors. Proceedings of the 6th ACM conference on Computing frontiers - CF '09. Presented at the the 6th ACM conference. https://doi.org/10.1145/1531743.1531752 Shen, X., & Jiang, Y. (2009). Co-Run Locality Prediction for Proactive Shared-Cache Management (Technical Report No. WM-CS-2009-03). Computer Science Department, The College of William and Mary. Mao, F., & Shen, X. (2009). Cross-Input Learning and Discriminative Prediction in Evolvable Virtual Machines. 2009 International Symposium on Code Generation and Optimization. Presented at the 2009 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO). https://doi.org/10.1109/cgo.2009.10 Mao, F., Zhang, E. Z., & Shen, X. (2009). Influence of program inputs on the selection of garbage collectors. Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments - VEE '09. Presented at the the 2009 ACM SIGPLAN/SIGOPS international conference. https://doi.org/10.1145/1508293.1508307 Shen, X., Jiang, Y., Zhang, E. Z., Tan, K., Mao, F., & Gethers, M. (2009). Program Seminal Behaviors: Automating Input Characterization for Large-Scope Proactive Behavior Prediction (Technical Report No. WM-CS-2009-07). Computer Science Department, The College of William and Mary. Zhong, Y., Shen, X., & Ding, C. (2009). Program locality analysis using reuse distance. ACM Transactions on Programming Languages and Systems, 31(6), 1–39. https://doi.org/10.1145/1552309.1552310 Jiang, Y., & Shen, X. (2009). Speculation with Little Wasting: Saving Cost in Software Speculation Through Transparent Learning (No. WM-CS-2009-08). Williamsburg, VA: Computer Science Department, The College of William and Mary. Jiang, Y., Mao, F., & Shen, X. (2009). Speculation with Little Wasting: Saving Cost in Software Speculation through Transparent Learning. 2009 15th International Conference on Parallel and Distributed Systems. Presented at the 2009 15th International Conference on Parallel and Distributed Systems. https://doi.org/10.1109/ICPADS.2009.130 Zhang, E. Z., Jiang, Y., Guo, Z., & Shen, X. (2009). Streamlining GPU Applications On the Fly – Thread Divergence Elimination through Runtime Thread-Data Remapping (No. WM-CS-2009-08). Williamsburg, VA: Computer Science Department, The College of William and Mary. Shen, X., Mao, F., Tian, K., & Zhang, E. Z. (2009). The study and handling of program inputs in the selection of garbage collectors. ACM SIGOPS Operating Systems Review, 43(3), 48. https://doi.org/10.1145/1618525.1618531 Liu, Y., Zhang, E. Z., & Shen, X. (2008). A Cross-Input Adaptive Framework for GPU Program Optimization (No. WM-CS-2008-09). Williamsburg, VA: Computer Science Department, The College of William and Mary. Jiang, Y., & Shen, X. (2008). Adaptive Software Speculation for Enhancing the Cost-Efficiency of Behavior-Oriented Parallelization. 2008 37th International Conference on Parallel Processing. Presented at the 2008 37th International Conference on Parallel Processing (ICPP). https://doi.org/10.1109/icpp.2008.50 Jiang, Y., & Shen, X. (2008, April). Adaptive speculation in behavior-oriented parallelization. 2008 IEEE International Symposium on Parallel and Distributed Processing. https://doi.org/10.1109/ipdps.2008.4536403 Jiang, Y., Shen, X., Chen, J., & Tripathi, R. (2008). Analysis and approximation of optimal co-scheduling on chip multiprocessors. Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08. Presented at the the 17th international conference. https://doi.org/10.1145/1454115.1454146 Mao, F., & Shen, X. (2008). Cross-Input Learning and Discriminative Prediction in Evolvable Virtual Machines (No. WM-CS-2008-06). Williamsburg, VA: Computer Science Department, The College of William and Mary. Jiang, Y., & Shen, X. (2008). Exploration of the Influence of Program Inputs on CMP Co-scheduling. In Lecture Notes in Computer Science (pp. 263–273). https://doi.org/10.1007/978-3-540-85451-7_29 Mao, F., & Shen, X. (2008). LU Decomposition on Cell Broadband Engine (Technical Report No. WM-CS-2008-08). Computer Science Department, The College of William and Mary. Shen, X., & Shaw, J. (2008). Scalable Implementation of Efficient Locality Approximation. In Languages and Compilers for Parallel Computing (pp. 202–216). https://doi.org/10.1007/978-3-540-89740-8_14 Shen, X. (2007). A Hybrid Framework Bridging Locality Analysis and Cache-Aware Scheduling for CMPs (Technical Report No. WM-CS-2007-01). Computer Science Dept., The College of William and Mary. Shen, X., Jiang, Y., & Mao, F. (2007). CAPS: Contention-Aware Proactive Scheduling for CMPs (Technical Report No. WM-CS-2007-09). Computer Science Department, The College of William and Mary. Shen, X., Shaw, J., Meeker, B., & Ding, C. (2007). Locality approximation using time. Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '07. Presented at the the 34th annual ACM SIGPLAN-SIGACT symposium. https://doi.org/10.1145/1190216.1190227 Zhong, Y., Dropsho, S. G., Shen, X., Studer, A., & Ding, C. (2007). Miss Rate Prediction Across Program Inputs and Cache Configurations. IEEE Transactions on Computers, 56(3), 328–343. https://doi.org/10.1109/tc.2007.50 Shen, X., & Mao, F. (2007). Modeling Relations Between Inputs and Dynamic Behavior for General Programs (No. WM-CS-2007-07). Williamsburg, VA: Computer Science Department, The College of William and Mary. Shen, X., Zhong, Y., & Ding, C. (2007). Predicting locality phases for dynamic memory optimization. Journal of Parallel and Distributed Computing, 67(7), 783–796. https://doi.org/10.1016/j.jpdc.2007.01.010 Ding, C., Shen, X., Kelsey, K., Tice, C., Huang, R., & Zhang, C. (2007). Software behavior oriented parallelization. Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation - PLDI '07. Presented at the the 2007 ACM SIGPLAN conference. https://doi.org/10.1145/1250734.1250760 Jiang, Y., & Shen, X. (2007). Study of the Effects of Program Inputs on Co-Scheduling (Technical Report No. WM-CS-2007-13). Computer Science Department, The College of William and Mary. Bai, T., Shen, X., Zhang, C., Scherer, W. N., Ding, C., & Scott, M. L. (2006). A Key-Based Adaptive Transactional Memory Executor (No. TR909). Rochester, NY: Computer Science Dept., University of Rochester. Shen, X., Shaw, J., & Meeker, B. (2006). Accurate Approximation of Locality from Time Distance Histograms (Technical Report No. TR902). Computer Science Dept., University of Rochester. Parallelization”, B.-O., Ding, C., Shen, X., Kelsey, K., Tice, C., Huang, R., & Zhang, C. (2006). Behavior-Oriented Parallelization (Technical Report No. TR904). Computer Science Dept., University of Rochester. Shen, X., Shaw, J., Meeker, B., & Ding, C. (2006). Locality Approximation Using Time (Technical Report No. TR901). Computer Science Dept., University of Rochester. Zhang, C., Kelsey, K., Shen, X., Ding, C., Hertz, M., & Ogihara, M. (2006). Program-level adaptive memory management. Proceedings of the 2006 international symposium on Memory management - ISMM '06. Presented at the the 2006 international symposium. https://doi.org/10.1145/1133956.1133979 Zhang, C., Kelsey, K., Shen, X., Ding, C., Hertz, M., & Ogihara, M. (2006). Waste Not, Want Not: Adaptive Garbage Collection in a Shared Environment (Technical Report No. TR908). Computer Science Dept., University of Rochester. Ding, C., Zhang, C., Shen, X., & Ogihara, M. (2005). Gated memory control for memory monitoring, leak detection and garbage collection. Proceedings of the 2005 workshop on Memory system performance - MSP '05. Presented at the the 2005 workshop. https://doi.org/10.1145/1111583.1111593 Shen, X., Gao, Y., Ding, C., & Archambault, R. (2005). Lightweight reference affinity analysis. Proceedings of the 19th annual international conference on Supercomputing - ICS '05. Presented at the the 19th annual international conference. https://doi.org/10.1145/1088149.1088167 Shen, X., & Ding, C. (2005). Parallelization of Utility Programs Based on Behavior Phase Analysis (No. TR876). Rochester, NY: Computer Science Dept., University of Rochester. Shen, X., Zhong, Y., & Ding, C. (2005). Phase-Based Miss Rate Prediction Across Program Inputs. In Lecture Notes in Computer Science (pp. 42–55). https://doi.org/10.1007/11532378_5 Zhong, Y., Shen, X., & Ding, C. (2004). A Hierarchical Model of Reference Affinity. In Languages and Compilers for Parallel Computing (pp. 48–63). https://doi.org/10.1007/978-3-540-24644-2_4 Shen, X., & Ding, C. (2004). Adaptive data partition for sorting using probability distribution. International Conference on Parallel Processing, 2004. ICPP 2004. Presented at the International Conference on Parallel Processing, 2004. ICPP 2004. https://doi.org/10.1109/icpp.2004.1327928 Zhong, Y., Orlovich, M., Shen, X., & Ding, C. (2004). Array regrouping and structure splitting using whole-program reference affinity. Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation - PLDI '04, 255. https://doi.org/10.1145/996841.996872 Shen, X., Ding, C., Dwarkdas, S., & Scott, M. L. (2004). Characterizing Phases in Service-Oriented Applications (Technical Report No. TR848). Computer Science Dept., University of Rochester. Boutell, M. R., Luo, J., Shen, X., & Brown, C. M. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9), 1757–1771. https://doi.org/10.1016/j.patcog.2004.03.009 Shen, X., Zhong, Y., & Ding, C. (2004). Locality phase prediction. Proceedings of the 11th international conference on Architectural support for programming languages and operating systems - ASPLOS-XI. Presented at the the 11th international conference. https://doi.org/10.1145/1024393.1024414 Shen, X., Boutell, M., Luo, J., & Brown, C. (2004). Multi-label Machine Learning and Its Application to Semantic Scene Classification. Proceedings of Storage and Retrieval Methods and Applications for Multimedia 2004, 5307, 188–199. https://doi.org/10.1117/12.523428 Shen, X., Zhong, Y., & Ding, C. (2003). Adaptive Data Partitioning using Probability Distribution (Technical Report No. TR823). Computer Science Dept., University of Rochester. Boutell, M., Shen, X., Luo, J., & Brown, C. (2003). Multi-label Semantic Scene Classification (Technical Report No. TR813). Dept. of Computer Science, University of Rochester. Shen, X., Zhong, Y., & Ding, C. (2003). Predicting Hierarchical Phases in Program Data Behavior (Technical Report No. TR824). Computer Science Dept., University of Rochester. Shen, X., Zhong, Y., & Ding, C. (2003). Regression-Based Multi-Model Prediction of Data Reuse Signature. Proceedings of the Fourth Annual Symposium of the Los Alamos Computer Science Institute, 243–251. Sante Fe, New Mexico, USA: Alamos Computer Science Institute. Ferguson, G., Allen, J., Blaylock, N., Byron, D., Chambers, N., Dzikovska, M., … Swift, M. (2002). The Medication Advisor Project: Preliminary Report (Technical Report No. 776). Dept. of Computer Science, University of Rochester. Shen, X., & Xu, B. (2001). Study and Auto-Detection of Stress Based on Tonal Pitch Range in Mandarin. Proceedings of Seventh European Conference on Speech Communication and Technology, 123–126. Aalborg, Denmark. Shen, X., & Xu, B. (2001). The Study Of The Effect Of Training Set On Statistical Language Modeling. Proceedings of Seventh European Conference on Speech Communication and Technology, 721–724. Aalborg, Denmark. Shen, X., & Xu, B. (2000). A CART-Based Hierarchical Stochastic Model for Prosodic Phrasing in Chinese. Proceedings of International Symposium on Chinese Spoken Language Processing 2000, 105–108. Beijing, China.