Works (194)

Updated: April 22nd, 2024 05:00

2024 conference paper

WiseGraph

Huang, K., Zhai, J., Zheng, L., Wang, H., Jin, Y., Zhang, Q., … Shen, X. (2024, April 22).

By: K. Huang, J. Zhai, L. Zheng, H. Wang, Y. Jin, Q. Zhang, R. Zhang, Z. Zheng, Y. Yi, X. Shen

Source: ORCID
Added: April 22, 2024

2023 journal article

Accelerating matrix-centric graph processing on GPUs through bit-level optimizations

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 177, 53–67.

author keywords: GraphBLAS; Bit manipulation; GPU; Sparse matrix; Deep reinforcement learning
Sources: Web Of Science, ORCID
Added: April 11, 2023

2023 journal article

Automated Translation of Functional Big Data similar to eries to SQL

PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 7(OOPSLA).

author keywords: program synthesis; source-to-source compiler; query optimization
TL;DR: Results show that (1) most RDD queries can be translated to SQL, (2) the tool is very effective at automating this translation, and (3) performing this translation offers significant performance benefits. (via Semantic Scholar)
Sources: ORCID, Web Of Science
Added: April 8, 2023

2023 article

BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs

PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2023, pp. 264–276.

By: J. Chen n, H. Sung n, X. Shen n, S. Choudhury* & A. Li*

author keywords: graph neural networks; binarized GNN; bit manipulation; GPU; sparse matrix
TL;DR: This work redesigns thebinary GNN inference backend from the efficiency perspective by proposing a series of abstractions and techniques to map binary GNNs and their computations best to fit the nature of bit manipulations on GPUs. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: January 29, 2024

2023 journal article

CompressGraph: Efficient Parallel Graph Analytics with Rule-Based Compression

Proceedings of the ACM on Management of Data.

TL;DR: CompressGraph is developed, an efficient rule-based graph analytics engine that leverages data redundancy in graphs to achieve both performance boost and space reduction for common graph applications. (via Semantic Scholar)
Source: ORCID
Added: February 2, 2024

2023 journal article

Expanding the Edge: Enabling Efficient Winograd CNN Inference With Deep Reuse on Edge Device

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 35(10), 10181–10196.

author keywords: CNN; deep reuse; inference; winograd
TL;DR: A new inference method, called DREW, is proposed, which combines deep reuse with Winograd for further accelerating CNNs, and reduces the number of convolution operations to 10% of the original operations, thus achieving up to 60% energy-efficiency benefits than the original Winog Rad inference. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: October 23, 2023

2023 article

Reconciling Selective Logging and Hardware Persistent Memory Transaction

2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, pp. 664–676.

By: C. Ye*, Y. Xu n, X. Shen n, Y. Sha*, X. Liao*, H. Jin*, Y. Solihin*

TL;DR: An ISA extension is presented that enables selective logging for hardware persistent memory transactions for the first time and outperforms the state-of-the-art hardware counterpart by 1.8× on average. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Sources: Web Of Science, ORCID
Added: June 5, 2023

2023 article

SpecPMT: Speculative Logging for Resolving Crash Consistency Overhead of Persistent Memory

PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, VOL 2, ASPLOS 2023, pp. 762–777.

By: C. Ye*, Y. Xu n, X. Shen n, Y. Sha*, X. Liao*, H. Jin*, Y. Solihin*

author keywords: persistent memory; transaction; logging; microarchitecture
TL;DR: This paper introduces speculative logging, a new method that forgoes most memory fences and reduces data persistence overhead by logging data values early, which enables a novel persistent transaction model, speculatively persistent memory transactions (SpecPMT). (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: November 6, 2023

2023 journal article

Survey: Exploiting Data Redundancy for Optimization of Deep Learning

ACM COMPUTING SURVEYS, 55(10).

By: J. Chen n, W. Niu*, B. Ren*, Y. Wang* & X. Shen n

author keywords: Data redundancy; representation redundancy; deep neural network; convolutional neural network; transformer
TL;DR: This article surveys hundreds of recent papers on data redundancy, introduces a novel taxonomy to put the various techniques into a single categorization framework, and offers a comprehensive description of the main methods used for exploiting data redundancy in improving multiple kinds of DNNs on data. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Sources: Web Of Science, ORCID
Added: March 6, 2023

2022 article

Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU

2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022), pp. 515–525.

By: J. Chen n, H. Sung n, X. Shen n, N. Tallent*, K. Barker* & A. Li*

TL;DR: A two-level representation named Bit-Block Compressed Sparse Row (B2SR) is proposed and a series of optimizations to the graph operations on B2SR by leveraging the intrinsics of modern GPUs are presented. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: September 29, 2022

2022 article

Brief Industry Paper: Enabling Level-4 Autonomous Driving on a Single $1k Off-the-Shelf Card

2022 IEEE 28TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS), pp. 297–300.

By: H. Sung n, Y. Xu n, J. Guan*, W. Niu*, B. Ren*, Y. Wang*, S. Liu, X. Shen n

TL;DR: It is shown that it is feasible to enable full leve1-4 autonomous driving workloads on a single off-the-shelf card (Jetson AGX Xavier) for less than 1.1 times less than the state-of- the-art systems, while meeting all the requirements of latency. (via Semantic Scholar)
UN Sustainable Development Goal Categories
9. Industry, Innovation and Infrastructure (OpenAlex)
Sources: Web Of Science, ORCID
Added: April 17, 2023

2022 article

DREW: Efficient Winograd CNN Inference with Deep Reuse

PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), pp. 1807–1816.

author keywords: data reuse; deep reuse; Winograd; Web systems
TL;DR: A new inference method, called DREW, is proposed, which combines deep reuse with Winograd for further accelerating CNNs, and can detect the similarities from the complex minimal filtering patterns by clustering and reduce the online clustering cost in a reasonable range. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Sources: Web Of Science, ORCID
Added: October 31, 2022

2022 journal article

Energy Efficient Boosting of GEMM Accelerators for DNN via Reuse

ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 27(5).

By: N. Cicek*, X. Shen n & O. Ozturk*

author keywords: Reuse; deep neural networks; gemm; accelerator
TL;DR: This article presents an in-depth exploration of architectural support for reuse-centric CNN, addresses some major limitations of the state-of-the-art design and proposes a novel hardware accelerator that improves neuron vector similarity detection and reduces the energy consumption of reuse-focused CNN inference. (via Semantic Scholar)
UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Sources: Web Of Science, ORCID
Added: October 17, 2022

2022 journal article

Exploring Data Analytics Without Decompression on Embedded GPU Systems

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 33(7), 1553–1568.

By: Z. Pan*, F. Zhang*, Y. Zhou*, J. Zhai*, X. Shen n, O. Mutlu*, X. Du*

author keywords: Graphics processing units; Embedded systems; Data analysis; Parallel processing; Instruction sets; Optimization; Random access memory; TADOC; embedded GPU systems; compression; data analytics
TL;DR: G-TADOC is proposed, a novel data analytics method for efficient text analytics directly on compression on embedded GPU systems that involves special optimizations for embedded GPUs, such as utilizing the CPU-GPU shared unified memory. (via Semantic Scholar)
UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Sources: Web Of Science, ORCID
Added: November 23, 2021

2022 article

FFCCD: Fence-Free Crash-Consistent Concurrent Defragmentation for Persistent Memory

PROCEEDINGS OF THE 2022 THE 49TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '22), pp. 274–288.

By: Y. Xu n, C. Ye*, Y. Solihin* & X. Shen n

author keywords: Non-volatile memory; Persistent memory; Memory management; Garbage collection; Defragmentation
TL;DR: Fence-Free Crash-consistent Concurrent Defragmentation (FFCCD) introduces architecture support for concurrent defragmentation that enables a fence-free design and fast read barrier, reducing two major overheads of defragmenting persistent memory. (via Semantic Scholar)
UN Sustainable Development Goal Categories
11. Sustainable Cities and Communities (Web of Science; OpenAlex)
Sources: Web Of Science, ORCID
Added: September 26, 2022

2022 article

GCD(2) : A Globally Optimizing Compiler for Mapping DNNs to Mobile DSPs

2022 55TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), pp. 512–529.

By: W. Niu*, J. Guan*, X. Shen n, Y. Wang*, G. Agrawal* & B. Ren*

author keywords: VLIW instruction packing; compiler optimization; deep neural network; mobile devices
TL;DR: Evaluation results show that GCD2 outperforms two product-level state-of-the-art end-to-end DNN execution frameworks (TFLite and Qualcomm SNPE) that support mobile DSPs by up to 6.0 speedup, and outperforms three established compilers (Halide, TVM, and RAKE), while its implementation enables two major DNNs to execute on a mobile D SP for the first time. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: December 12, 2022

2022 journal article

General Reuse-Centric CNN Accelerator

IEEE TRANSACTIONS ON COMPUTERS, 71(4), 880–891.

By: N. Cicek*, L. Ning n, O. Ozturk* & X. Shen n

author keywords: Neurons; Hardware; Convolution; Engines; Software; Acceleration; IEEE Senior Members; CNN; reuse-centric; accelerator
TL;DR: The first general reuse-centric accelerator for CNN inferences is introduced, able to discover similarities among arbitrary patches within an image or across independent images, and translate them into computation time and energy savings. (via Semantic Scholar)
UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Sources: ORCID, Web Of Science
Added: March 11, 2022

2022 article

IDE Augmented with Human-Learning Inspired Natural Language Programming

2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2022), pp. 110–114.

By: M. Young n, Z. Nan n & X. Shen n

author keywords: Program synthesis; natural language programming; code editor
TL;DR: An NLU-driven approach that forgoes the need for large numbers of labeled training examples, inspired by how humans learn programming, and draws on a novel graph-based mapping algorithm is proposed. (via Semantic Scholar)
UN Sustainable Development Goal Categories
4. Quality Education (OpenAlex)
Sources: Web Of Science, ORCID
Added: September 19, 2022

2022 article

Interactive NLU-Powered Ontology-Based Workflow Synthesis for FAIR Support of HPC

2022 IEEE/ACM INTERNATIONAL WORKSHOP ON HPC USER SUPPORT TOOLS (HUST), pp. 29–40.

By: Z. Nan n, M. Dave n, X. Shen n, C. Liao*, T. Vanderbruggen*, P. Lin*, M. Emani*

author keywords: Ontology; Workflow; Synthesis; HPC; FAIR; NLP
TL;DR: INPOWS allows the use of Natural Language for queries, maximizes the robustness in handling concepts and language ambiguities through an interactive ontology-based design, and achieves superior extensibility by adopting a synthesis algorithm powered by Natural Language Understanding. (via Semantic Scholar)
UN Sustainable Development Goal Categories
4. Quality Education (Web of Science)
Sources: Web Of Science, ORCID
Added: May 9, 2023

2022 journal article

POCLib: A High-Performance Framework for Enabling Near Orthogonal Processing on Compression

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 33(2), 459–475.

author keywords: Task analysis; Indexing; Data analysis; Technological innovation; Dictionaries; Data structures; Standards; Near orthogonal processing on compression; direct processing on compressed data; TADOC; orthogonal POC
TL;DR: This work proposes a novel concept, orthogonal processing on compression (orthogonal POC), which means that text analytics can be efficiently supported directly on compressed data, regardless of the type of the data processing, and yields a unified high-performance library, called POCLib. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: August 16, 2021

2022 journal article

Preserving Addressability Upon GC-Triggered Data Movements on Non-Volatile Memory

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 19(2).

author keywords: Persistent memory; garbage collector; memory management
TL;DR: The article proposes the concept of movement-oblivious addressing (MOA), and develops and compares three novel solutions to materialize the concept for solving the addressability problem. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: April 18, 2022

2022 journal article

Sequential Model Optimization for Software Effort Estimation

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 48(6), 1994–2009.

By: T. Xia n, R. Shu n, X. Shen n & T. Menzies n

author keywords: Estimation; Software; Tools; Optimization; Data models; Task analysis; Mathematical model; Effort estimation; COCOMO; hyperparameter tuning; regression trees; sequential model optimization
TL;DR: This paper applies a configuration technique called “ROME” (Rapid Optimizing Methods for Estimation), which uses sequential model-based optimization (SMO) to find what configuration settings of effort estimation techniques work best for a particular data set. (via Semantic Scholar)
Sources: ORCID, Web Of Science
Added: June 15, 2022

2022 article

Temporal Exposure Reduction Protection for Persistent Memory

2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022), pp. 908–924.

By: Y. Xu n, C. Ye*, X. Shen n & Y. Solihin*

author keywords: Memory Security; Persistent Memory; Memory Exposure Reduction; Hardware-Software Co-Design
TL;DR: This paper develops temporal exposure reduction protection (TERP) as a framework for enforcing memory safety and proposes programming system and architecture solutions for the key challenges for the adoption of TERP, which draws on novel supports in both compilers and hardware to efficiently meet the exposure time target. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: August 29, 2022

2021 journal article

A Machine Learning Based Ensemble Forecasting Optimization Algorithm for Preseason Prediction of Atlantic Hurricane Activity

ATMOSPHERE, 12(4).

By: X. Sun n, L. Xie n, S. Shah n & X. Shen n

author keywords: hurricane prediction; machine learning; ensemble model
TL;DR: The results show that neither SAE nor ML-OE was able to improve the forecasts of the response variables when all models show consistent bias, and that increasing the number of ensemble members does not necessarily lead to better ensemble forecasts. (via Semantic Scholar)
UN Sustainable Development Goal Categories
13. Climate Action (Web of Science)
Sources: Web Of Science, ORCID
Added: April 21, 2021

2021 journal article

An Automatic Synthesizer of Advising Tools for High Performance Computing

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 32(2), 330–341.

By: H. Guan*, X. Shen n & H. Krim n

author keywords: Tools; Optimization; Programming; Syntactics; Semantics; Guidelines; Natural language processing; Performance tools; natural language processing; code optimization
TL;DR: Egeria is built based on a distinctive multi-layered design that leverages natural language processing (NLP) techniques and extends them with HPC-specific knowledge and considerations and can retrieve relevant optimization knowledge for optimization questions. (via Semantic Scholar)
UN Sustainable Development Goal Categories
4. Quality Education (OpenAlex)
Sources: Web Of Science, ORCID
Added: September 28, 2020

2021 article

Brief Industry Paper: Towards Real-Time 3D Object Detection for Autonomous Vehicles with Pruning Search

2021 IEEE 27TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS 2021), pp. 425–428.

By: P. Zhao*, W. Niu*, G. Yuan*, Y. Cai*, H. Sung n, S. Liu, S. Liu*, X. Shen n ...

author keywords: 3D object detection; real-time; point cloud
TL;DR: It is demonstrated in experiments that for the first time, the pruning search framework can achieve real-time 3D object detection on mobile with state-of-the-art detection performance. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Sources: Web Of Science, ORCID
Added: November 29, 2021

2021 journal article

CoCoPIE: Enabling Real-Time AI on Off-the-Shelf Mobile Devices via Compression-Compilation Co-Design

COMMUNICATIONS OF THE ACM, 64(6), 62–68.

TL;DR: A new framework allows intelligence on mainstream end devices without special hardware to be placed on smartphones and tablets without needing special hardware. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Sources: Web Of Science, ORCID
Added: June 21, 2021

2021 journal article

Coarsening Optimization for Differentiable Programming

PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 5(OOPSLA).

By: X. Shen n, G. Zhang n, I. Dea*, S. Andow*, E. Arroyo-Fang*, N. Gafter*, J. George*, M. Grueter* ...

author keywords: differentiable programming; compiler; program optimizations; SSA; Calculus
TL;DR: This work introduces phi-calculus, a novel method to allow symbolic reasoning and differentiation of computations that involve branches and loops and avoids "expression swell" in symbolic differentiation and balance reuse and coarsening through the design of reuse-centric segment of interest identification. (via Semantic Scholar)
Sources: ORCID, Web Of Science
Added: January 2, 2022

2021 article

G-TADOC: Enabling Efficient GPU-Based Text Analytics without Decompression

2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), pp. 1679–1690.

By: F. Zhang*, Z. Pan*, Y. Zhou*, J. Zhai*, X. Shen n, O. Mutlu, X. Du*

author keywords: TADOC; GPU; parallelism; analytics on compressed data
TL;DR: G-TADOC is described, the first framework that provides GPU-based text analytics directly on compression, effectively enabling efficient text analytics on GPUs without decompressing the input data and develops a novel fine-grained thread-level workload scheduling strategy for GPU threads. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: September 20, 2021

2021 article

HPC Ontology: Towards a Unified Ontology for Managing Training Datasets and AI Models for High-Performance Computing

PROCEEDINGS OF THE WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC 2021), pp. 69–80.

author keywords: Ontology; HPC; FAIR; datasets; AI models
TL;DR: This paper's ongoing work of designing an ontology for high-performance computing (named HPC ontology) in order to make training datasets and AI models FAIR provides controlled vocabularies, explicit semantics, and formal knowledge representations. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: March 21, 2022

2021 article

HPCFAIR: Enabling FAIR AI for HPC Applications

PROCEEDINGS OF THE WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC 2021), pp. 58–68.

author keywords: HPC; FAIR; AI models; datasets; neural networks
TL;DR: HPCFAIR is proposed, a modular, extensible framework to enable AI models to be Findable, Accessible, Interoperable and Reproducible (FAIR), which enables users with a structured approach to search, load, save and reuse the models in their codes. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: March 21, 2022

2021 article

Hardware-Based Address-Centric Acceleration of Key-Value Store

2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), pp. 736–748.

By: C. Ye*, Y. Xu n, X. Shen n, X. Liao*, H. Jin* & Y. Solihin*

TL;DR: This paper introduces an address-centric approach to speed up the addressing by creating a shortcut for the translation of a key to the physical address of the value, using a novel in-memory table, STLT, a virtual-physical address buffer, and two new instructions. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Sources: Web Of Science, ORCID
Added: July 26, 2021

2021 article

Recurrent Neural Networks Meet Context-Free Grammar: Two Birds with One Stone

2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), pp. 1078–1083.

By: H. Guan*, U. Chaudhary*, Y. Xu n, L. Ning n, L. Zhang* & X. Shen n

author keywords: recurrent neural networks; data compression; context free grammar; tokenization
TL;DR: This work introduces CFG-guided compressed learning, an approach that creatively integrates Context-Free Grammar (CFG) and online tokenization into RNN learning and inference for streaming inputs through a hierarchical compression algorithm. (via Semantic Scholar)
UN Sustainable Development Goal Categories
4. Quality Education (OpenAlex)
Sources: Web Of Science, ORCID
Added: May 2, 2022

2021 journal article

Reuse-centric k-means configuration

INFORMATION SYSTEMS, 100.

By: L. Zhang*, H. Guan*, Y. Ding*, X. Shen n & H. Krim n

author keywords: K-means; Algorithm configuration; Computation reuse
Sources: Web Of Science, ORCID
Added: June 10, 2021

2021 article

Revisit the Scalability of Deep Auto-Regressive Models for Graph Generation

2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN).

By: S. Yang n, X. Shen n & S. Lim*

TL;DR: It is concluded that the perceived “inherent” scalability limitation is a misperception; with the right design and implementation, deep auto-regressive graph generation can be applied to graphs much larger than the device memory. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: January 10, 2022

2021 article

Seeds of SEED: New Security Challenges for Persistent Memory

2021 INTERNATIONAL SYMPOSIUM ON SECURE AND PRIVATE EXECUTION ENVIRONMENT DESIGN (SEED 2021), pp. 83–88.

By: N. Ul Mustafa*, Y. Xu n, X. Shen n & Y. Solihin*

author keywords: Persistent memory objects; Security attacks; PMO vulnerability
TL;DR: Security implications of using the PMO, highlighting sample PMO-based attacks and potential strategies to defend against them, and threat vulnerabilities that are either new or increased in intensity under PMO programming model are discussed. (via Semantic Scholar)
UN Sustainable Development Goal Categories
16. Peace, Justice and Strong Institutions (OpenAlex)
Sources: Web Of Science, ORCID
Added: June 20, 2022

2021 journal article

Simpler Hyperparameter Optimization for Software Analytics: Why, How, When

IEEE Transactions on Software Engineering, 48(8), 1–1.

By: A. Agrawal*, X. Yang n, R. Agrawal n, R. Yedida n, X. Shen n & T. Menzies n

Contributors: A. Agrawal*, X. Yang n, R. Agrawal n, R. Yedida n, X. Shen n & T. Menzies n

author keywords: Software analytics; hyperparameter optimization; defect prediction; bad smell detection; issue close time; bug reports
TL;DR: The simple DODGE works best for data sets with low “intrinsic dimensionality” and very poorly for higher-dimensional data; nearly all the SE data seen here was intrinsically low-dimensional, indicating that DODGE is applicable for many SE analytics tasks. (via Semantic Scholar)
Sources: ORCID, Web Of Science, Crossref
Added: June 12, 2021

2021 article

Supporting Legacy Libraries on Non-Volatile Memory: A User-Transparent Approach

2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), pp. 443–455.

By: C. Ye*, Y. Xu n, X. Shen n, X. Liao*, H. Jin* & Y. Solihin*

TL;DR: A new concept that allows programmers to reference a persistent object in the same way as reference a normal (volatile) object is presented, and compiler and simple architecture support for keeping performance overheads very low is described. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Sources: Web Of Science, ORCID
Added: October 26, 2021

2021 journal article

TADOC: Text analytics directly on compression

VLDB JOURNAL, 30(2), 163–188.

author keywords: Text analytics; Document analytics; Compression; Sequitur
TL;DR: A series of guidelines and technical solutions that effectively address challenges of text analytics directly on compression, including the adoption of a hierarchical compression method and a set of novel algorithms and data structure designs are presented. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: October 5, 2020

2021 article

Toward Efficient Interactions between Python and Native Libraries

PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21), pp. 1117–1128.

author keywords: Python; profiling; PMU; debug register
TL;DR: PieProf, a lightweight profiler, is developed to pinpoint interaction inefficiencies in Python applications and associate inefficiences with high-level Python code to provide a holistic view, and optimization of 17 realworld applications is guided. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: March 7, 2022

2021 journal article

UDF to SQL Translation through Compositional Lazy Inductive Synthesis

PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 5(OOPSLA).

By: G. Zhang n, Y. Xu n, X. Shen n & I. Dillig*

author keywords: program synthesis; source-to-source compiler; query optimization
TL;DR: A new technique for translating SQL queries with UDFs to pure SQL expressions using a novel compositional strategy that decomposes the synthesis task into simpler sub-problems and scales significantly better than traditional CEGIS. (via Semantic Scholar)
Sources: ORCID, Web Of Science
Added: January 2, 2022

2020 journal article

DIAC An Inter-app Conflicts Detector for Open IoT Systems

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 19(6).

By: X. Li*, L. Zhang n & X. Shen n

author keywords: IoT; compiler; conflicts detection
TL;DR: An innovative solution policy for solving various detected conflicts is developed and an efficient conflict detection algorithm is developed that implements a compiler and runtime software system that integrates all the proposed techniques together into a comprehensive solution. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: January 4, 2021

2020 conference paper

Enabling Efficient Random Access to Hierarchically-Compressed Data

2020 IEEE 36th International Conference on Data Engineering (ICDE), 1069–1080.

TL;DR: A set of techniques are presented that successfully eliminate the limitation of direct data processing for random accesses, and for the first time, establish the feasibility of effectively handling both data traversal operations and random data accesses on hierarchically-compressed data. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: January 4, 2021

2020 journal article

Enabling Runtime SpMV Format Selection through an Overhead Conscious Method

IEEE Transactions on Parallel and Distributed Systems, 31(1), 80–93.

By: W. Zhou n, Y. Zhao n, X. Shen n & W. Chen*

author keywords: SpMV; high performance computing; program optimization; sparse matrix format; prediction model
TL;DR: This work shows that the runtime overhead makes the predictions from previous solutions frequently sub-optimal and sometimes inferior regarding the end-to-end time, and proposes a new paradigm for SpMV storage selection, an overhead-conscious method. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: January 25, 2020

2020 article

GOPipe: A Granularity-Oblivious Programming Framework for Pipelined Stencil Executions on GPU

PACT '20: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, pp. 43–54.

author keywords: Programming Framework; GPU; Optimizations
TL;DR: GOPipe is presented, a granularity-oblivious programming framework for efficient pipelined stencil executions on GPU that outperforms the state-of-the-art system by 1.39X on average with a much better programming productivity. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: December 13, 2021

2020 article

HARP: Holistic Analysis for Refactoring Python-Based Analytics Programs

2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020), pp. 506–517.

By: W. Zhou n, Y. Zhao*, G. Zhang n & X. Shen n

author keywords: machine learning program; computation graph; dynamic language; program analysis
TL;DR: HARP enables holistic analysis that spans across computation graphs and their hosting Python code and achieves it through a set of novel techniques: analytics-conscious speculative analysis to circumvent Python complexities, a unified representation augmented computation graphs to capture all dimensions of knowledge related with the holistic analysis, and conditioned feedback mechanism to allow risk-controlled aggressive analysis. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: June 21, 2021

2020 article

Hardware-Based Domain Virtualization for Intra-Process Isolation of Persistent Memory Objects

2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2020), pp. 680–692.

By: Y. Xu n, C. Ye*, Y. Solihin* & X. Shen n

author keywords: Persistent Memory Objects; Memory Protection Keys; Intra-process Isolation
TL;DR: This paper presents two novel architecture supports, which provide 11 - 52 × higher efficiency while offering the first known domain-based protection for PMOs. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Sources: Web Of Science, ORCID
Added: March 8, 2021

2020 article

MERR: Improving Security of Persistent Memory Objects via Efficient Memory Exposure Reduction and Randomization

TWENTY-FIFTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXV), pp. 987–1000.

By: Y. Xu n, Y. Solihin* & X. Shen n

author keywords: persistent memory objects; memory exposure reduction; runtime randomization
TL;DR: The paper discusses the complexities the technique brings, to permission controls and hardware implementations, and provides solutions, and shows that the new technique reduces memory exposure time by 60% with a 5% time overhead and allows much more frequent address randomizations, offering significant potential for enhancing memory security. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Sources: Web Of Science, ORCID
Added: July 6, 2020

2020 article

Special Issue: Graph Computing

Jin, H., Shen, X., Lovas, R., & Liao, X. (2020, February 10). CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, Vol. 32.

By: H. Jin*, X. Shen n, R. Lovas* & X. Liao*

TL;DR: The proposed special issue of Concurrency and Computation: Practice and Experience contains revised and extended versions of selected best papers with respect to graph computing at the 21st IEEE International Conference on Parallel and Distributed Systems (ICPADS’16), which was held at Wuhan, China, on December 13-16, 2016. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: August 19, 2019

2019 article

Adaptive Deep Reuse: Accelerating CNN Training on the Fly

2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), pp. 1538–1549.

By: L. Ning n, H. Guan n & X. Shen n

author keywords: CNN; neuron vector; similarity; training; adaptive; deep reuse
UN Sustainable Development Goal Categories
Sources: Web Of Science, ORCID
Added: August 19, 2019

2019 conference paper

Deep reuse

Proceedings of the ACM International Conference on Supercomputing - ICS '19. Presented at the the ACM International Conference.

By: L. Ning n & X. Shen n

Event: the ACM International Conference

author keywords: Deep neural networks; Program Optimizations; GPU
TL;DR: This paper empirically reveals the massive similarities among neuron vectors in activation maps, both within CNN inferences on an input and across inputs, and gives an in-depth study on how to effectively turn the similarities into beneficial computation reuse to speed upCNN inferences. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Sources: Crossref, ORCID
Added: January 25, 2020

2019 conference paper

HiWayLib

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '19. Presented at the the Twenty-Fourth International Conference.

Event: the Twenty-Fourth International Conference

author keywords: pipeline communication; CPU-GPU system; contention relief; end detection; lazy copy
TL;DR: This work identifies three key issues, namely, slow and error-prone detection of the end of pipeline processing, intensive queue contentions on GPU, and cumbersome inter-device data movements, and integrates all together to form a unified library named HiWayLib. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: January 25, 2020

2019 journal article

How to "DODGE" Complex Software Analytics

IEEE Transactions on Software Engineering, 47(10), 1–1.

By: A. Agrawal*, W. Fu, D. Chen*, X. Shen n & T. Menzies n

author keywords: Tuning; Text mining; Software; Task analysis; Optimization; Software engineering; Tools; Software analytics; hyperparameter optimization; defect prediction; text mining
TL;DR: By ignoring redundant tunings, ODGE, a tuning tool, runs orders of magnitude faster, while also generating learners with more accurate predictions than seen in prior state-of-the-art approaches. (via Semantic Scholar)
Sources: Web Of Science, ORCID, Crossref
Added: January 25, 2020

2019 conference paper

IA-graph based inter-app conflicts detection in open IoT systems

Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems - LCTES 2019. Presented at the the 20th ACM SIGPLAN/SIGBED International Conference.

By: X. Li*, L. Zhang n & X. Shen n

Event: the 20th ACM SIGPLAN/SIGBED International Conference

TL;DR: This paper provides a new set of definitions and categorizations of the conflicts to more precisely characterize the nature of the problem, and employs a graph representation for formally representing IoT controls and inter-app interplays. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: January 25, 2020

2019 conference paper

In-Place Zero-Space Memory Protection for CNN

In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, & R. Garnett (Eds.), Advances in Neural Information Processing Systems Proceedings.

By: H. Guan, L. Ning, Z. Lin, X. Shen, H. Zhou & S. Lim

Ed(s): H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox & R. Garnett

Source: NC State University Libraries
Added: January 25, 2020

2019 article

POSTER: GOPipe: A Granularity-Oblivious Programming Framework for Pipelined Stencil Executions on GPU

PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19), pp. 431–432.

author keywords: GPU; Pipelined Execution; Data Dependence
Sources: Web Of Science, ORCID
Added: December 11, 2020

2019 conference paper

Streamline Density Peak Clustering for Practical Adoptions

Proceedings of the 28th ACM International Conference on Information and Knowledge Management - CIKM '19. Presented at the the 28th ACM International Conference.

By: S. Yang n, X. Shen n & M. Chi n

Event: the 28th ACM International Conference

author keywords: density clustering; algorithm optimization; hyperparameter tuning
TL;DR: Streamlined Density Peak Clustering (SDPC) offers an efficient and scalable drop-in replacement of DPC for data clustering, and preserves the original semantic of D PC. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: January 25, 2020

2019 conference paper

Wootz: a compiler-based framework for fast CNN pruning via composability

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation - PLDI 2019. Presented at the the 40th ACM SIGPLAN Conference.

By: H. Guan n, X. Shen n & S. Lim*

Event: the 40th ACM SIGPLAN Conference

author keywords: CNN; network pruning; compiler; composability
TL;DR: A compiler-based framework named Wootz is developed, which, for an arbitrary CNN, automatically generates code that builds a Teacher-Student scheme to materialize composability-based pruning, and a compression-based algorithm is designed to efficiently identify the set of CNN layers to pre-train for maximizing their reuse benefits in CNN pruning. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: January 25, 2020

2018 conference paper

Bridging the Gap between Deep Learning and Sparse Matrix Format Selection

ACM SIGPLAN NOTICES, 53(1), 94–108.

By: Y. Zhao n, J. Li*, C. Liao* & X. Shen n

TL;DR: This work describes how to effectively bridge the gap between deep learning and the special needs of the pillar HPC problem through a set of techniques on matrix representations, deep learning structure, and cross-architecture model migrations. (via Semantic Scholar)
Sources: NC State University Libraries, ORCID
Added: October 16, 2018

2018 article

Editorial for the Special Issue on In-Memory Computing

Shen, X., Lovas, R., & Liao, X. (2018, October). JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, Vol. 120, pp. 322–322.

By: X. Shen n, R. Lovas* & X. Liao*

Sources: Web Of Science, ORCID
Added: October 19, 2018

2018 journal article

Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights

PROCEEDINGS OF THE VLDB ENDOWMENT, 11(11), 1522–1535.

TL;DR: This work proposes the concept of compression-based direct processing to enable direct document analytics on compressed data, and presents how the concept can be materialized on Sequitur, a compression algorithm that produces hierarchical grammar-like representations. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: December 31, 2018

2018 conference paper

Exploring Flexible Communications for Streamlining DNN Ensemble Training Pipelines

SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. Presented at the SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

By: R. Pittman n, H. Guan n, X. Shen n, S. Lim* & R. Patton*

Event: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis

TL;DR: This project investigates a series of designs to improve pipeline flexibility and adaptivity, while also increasing performance, and shows that with the new flexible communication schemes, the CPU time spent during training is reduced by 2-11X, and the implementation can achieve up to 10X speedups when CPU core limits are imposed. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: January 25, 2020

2018 article

FALCON: A Fast Drop-In Replacement of Citation KNN for Multiple Instance Learning

CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, pp. 67–76.

By: S. Yang n & X. Shen n

author keywords: Citation KNN; Triangle Inequality; Multiple-instance Learning
TL;DR: FALCON accelerates Citation KNN by removing unnecessary distance calculations through two novel optimizations, multi-level triangle inequality-based distance filtering and heap optimization, making it a promising drop-in replacement of Citation Knn for multiple instance learning. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Sources: Web Of Science, ORCID
Added: February 4, 2019

2018 article

Footprint Modeling of Cache Associativity and Granularity

PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS (MEMSYS 2018), pp. 232–242.

By: H. Luo*, G. Chen n, F. Liu*, P. Li*, C. Ding* & X. Shen n

author keywords: Partial Footprint; Mapped Footprint; Dual-grained Footprint; Joint Modeling
TL;DR: This short paper shows how the new models are more general, accurate or efficient than previous modeling solutions in either technique, and how they can be used together to model the cache implemented with both techniques, i.e. sub-block set associative cache. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: July 22, 2019

2018 report

Inter-Disciplinary Research Challenges in Computer Systems for the 2020s

National Science Foundation.

By: A. Cohen, X. Shen, J. Torrellas, J. Tuck & Y. Zhou

Source: NC State University Libraries
Added: June 17, 2022

2018 journal article

LCD: A Fast Contrastive Divergence Based Algorithm for Restricted Boltzmann Machine

NEURAL NETWORKS, 108, 399–410.

By: L. Ning n, R. Pittman n & X. Shen n

author keywords: RBM; Contrastive Divergence; Acceleration
MeSH headings : Algorithms; Databases, Factual; Deep Learning / trends; Machine Learning / trends; Neural Networks, Computer; Time Factors
UN Sustainable Development Goal Categories
Sources: Web Of Science, ORCID
Added: December 3, 2018

2018 article

LEEM: Lean Elastic EM for Gaussian Mixture Model via Bounds-Based Filtering

2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), pp. 677–686.

By: S. Yang n & X. Shen n

author keywords: Gaussian Mixture Model; Acceleration; Expectation Maximization; Elastic EM
TL;DR: This work proposes several novel optimizations to further accelerate Elastic EM, which brings multi-fold speedups on six datasets of various sizes and dimensions and creates Lean Elastic EM (LEEM), which is named Elastic EM in this paper. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Sources: Web Of Science, ORCID
Added: May 6, 2019

2018 article

Overhead-Conscious Format Selection for SpMV-Based Applications

2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), pp. 950–959.

By: Y. Zhao n, W. Zhou n, X. Shen n & G. Yiu*

author keywords: SpMV; High Performance Computing; Program Optimizations; Sparse Matrix Format; Prediction Model
TL;DR: A two-stage lazy-and-light scheme to help control the risks in the format predictions and at the same time maximize the overall format conversion benefits is proposed, which outperforms previous techniques significantly. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: October 16, 2018

2018 journal article

Resolving the GPU responsiveness dilemma through program transformations

Frontiers of Computer Science, 12(3), 545–559.

author keywords: program transformation; GPU; integrated architecture; responsiveness
TL;DR: This study identifies the GPU responsiveness dilemma: host busy polling responds quickly, but at the expense of high energy consumption and interference with co-running CPU programs; interrupt-based notification minimizes energy and CPU interference costs, but suffers from substantial response delay. (via Semantic Scholar)
UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Sources: Crossref, ORCID
Added: January 25, 2020

2018 article

Rethinking Compilers in the Rise of Machine Learning and AI

CC'18: PROCEEDINGS OF THE 27TH INTERNATIONAL CONFERENCE ON COMPILER CONSTRUCTION, pp. 1–1.

By: X. Shen n

author keywords: Compilers; Machine Learning; AI; NLP; High-Level Program Optimizations
TL;DR: This talk will discuss how ML and AI may help break the "abstraction wall"---barriers formed by layers of abstractions in modern software---for program analysis and optimizations, and how ML may transform the way in which high-level user intentions get translated into low-level code implementations. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Sources: Web Of Science, ORCID
Added: February 25, 2019

2018 article

Reuse-Centric K-Means Configuration

2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), pp. 1224–1227.

By: H. Guan n, Y. Ding n, X. Shen n & H. Krim n

TL;DR: A set of novel techniques are presented, including reuse-based filtering, center reuse, and a two-phase design to capitalize on the reuse opportunities on three levels: validation, k, and feature sets, to accelerate k-means configuration. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: November 11, 2019

2018 article

Taming the "Monster": Overcoming Program Optimization Challenges on SW26010 Through Precise Performance Modeling

2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), pp. 763–773.

By: S. Xu*, Y. Xu*, W. Xue*, X. Shen n, F. Zheng, X. Huang*, G. Yang*

TL;DR: An effort for overcoming the complexities of program optimizations on SW26010, the heterogeneous many-core processor that powers Sunway TaihuLight, the world top one supercomputer, is presented, showing a precise, static performance model that achieves a high accuracy and speeds up the tuning process by as much as a factor of 43 while keeping the tuning quality loss below 6%. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: October 16, 2018

2018 conference paper

Zwift

Proceedings of the 2018 International Conference on Supercomputing - ICS '18. Presented at the the 2018 International Conference.

Event: the 2018 International Conference

author keywords: Compilers; Domain Specific Languages; Text Analytics
TL;DR: Zwift is presented, the first programming framework for TADOC, which consists of a Domain Specific Language, a compiler and runtime, and a utility library, and experiments show that Zwift significantly improves programming productivity, while effectively unleashing the power of TAD OC. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: January 25, 2020

2017 conference paper

An infrastructure for HPC knowledge sharing and reuse

ACM SIGPLAN Notices, 52(8), 461–462.

By: Y. Zhao n, C. Liao* & X. Shen n

UN Sustainable Development Goal Categories
9. Industry, Innovation and Infrastructure (OpenAlex)
Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2017 conference paper

Bridging the gap between memory performance and massive parallelism: The critical role of programming systems innovations (keynote)

ACM SIGPLAN Notices, 52(9), 1–1.

By: X. Shen n

Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2017 article

Co-Run Scheduling with Power Cap on Integrated CPU-GPU Systems

2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), pp. 967–977.

By: Q. Zhu*, B. Wo, X. Shen n, L. Shen* & Z. Wang*

TL;DR: This paper presents the first systematic study on co-scheduling independent jobs on integrated CPU-GPU systems with power caps considered and offers several algorithms and a lightweight co-run performance and power predictive model for computing the performance bounds of the optimal co- schedules and finding appropriate schedules. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: August 6, 2018

2017 chapter

Data placement on GPUs

In Advances in GPU Research and Practice (pp. 105–123).

By: X. Shen n & B. Wu*

TL;DR: This chapter discusses the complexity of GPU memory systems and describes a software framework named PORPLE to show how to automatically resolve the complexity for a given GPU program. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: December 7, 2020

2017 conference paper

EffiSha: A software framework for enabling efficient preemptive scheduling of GPU

ACM SIGPLAN Notices, 52(8), 3–16.

By: G. Chen n, Y. Zhao n, X. Shen n & H. Zhou n

Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2017 conference paper

Efficient support of position independence on non-volatile memory

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-50 '17. Presented at the the 50th Annual IEEE/ACM International Symposium.

By: G. Chen*, L. Zhang n, R. Budhiraja*, X. Shen n & Y. Wu*

Event: the 50th Annual IEEE/ACM International Symposium

author keywords: Compiler; Program Optimizations; Programming Languages; NVM
TL;DR: Experiments show that the enabled representations provide much more efficient and flexible support of position independence for dynamic data structures, alleviating a major issue for effective data reuses on NVM. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: January 25, 2020

2017 conference paper

Egeria

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17. Presented at the the International Conference for High Performance Computing, Networking, Storage and Analysis.

By: H. Guan n, X. Shen n & H. Krim n

Event: the International Conference for High Performance Computing, Networking, Storage and Analysis

author keywords: program optimization; high performance computing; natural language processing
TL;DR: This work develops a framework named Egeria, which can easily construct an advising tool for a certain high performance computing (HPC) domain by providing Egersia with a optimization guide or other related documents for the target domain, and provides a concise list of essential rules automatically extracted from the documents. (via Semantic Scholar)
UN Sustainable Development Goal Categories
4. Quality Education (OpenAlex)
Sources: Crossref, ORCID
Added: January 25, 2020

2017 journal article

GLORE: generalized loop redundancy elimination upon LER-notation

Proceedings of the ACM on Programming Languages, 1(OOPSLA), 1–28.

By: Y. Ding n & X. Shen n

author keywords: program optimization; loop redundancy elimination; operation minimization
TL;DR: GLORE shows an applicability much broader than prior methods have, and frequently lowers the computational complexities of some nested loops that are elusive to prior optimization techniques, producing significantly larger speedups. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: January 25, 2020

2017 conference paper

Generalizations of the theory and deployment of triangular inequality for compiler-based strength reduction

ACM SIGPLAN Notices, 52(6), 33–48.

By: Y. Ding n, L. Ning n, H. Guan n & X. Shen n

Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2017 article

LCD: A Fast Contrastive Divergence Based Algorithm for Restricted Boltzmann Machine

2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), pp. 1015–1020.

By: L. Ning n, R. Pittman n & X. Shen n

TL;DR: Lean Contrastive Divergence (LCD) is proposed, a modified contrastive divergence learning algorithm, to accelerate RBM learning and prediction without changing the results. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Sources: Web Of Science, ORCID
Added: August 6, 2018

2017 journal article

Optimizing Data Placement on GPU Memory: A Portable Approach

IEEE Transactions on Computers, 66(3), 473–487.

By: G. Chen n, X. Shen n, B. Wu* & D. Li*

author keywords: GPU; memory performance; cache; compiler; data placement; hardware specification language
TL;DR: This article provides a comprehensive description of this method, and presents several extensions that significantly improve the scalability of PORPLE, which include a novel algorithm design for efficiently searching for the best data placements, the use of active profiling for reducing the online-profiling overhead, and a systematic examination of a path-based performance model. (via Semantic Scholar)
Sources: Web Of Science, ORCID, Crossref
Added: August 6, 2018

2017 chapter

Software-level task scheduling on GPUs

In Advances in GPU Research and Practice (pp. 83–103).

By: B. Wu* & X. Shen n

TL;DR: This chapter presents a compiler and runtime framework with the capability to automatically transform and optimize GPU programs to enable controllable task scheduling to the streaming multiprocessors (SMs), which addresses the complexities of the hardware scheduler and provides the scheduling capability. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: December 7, 2020

2017 article

Sweet KNN: An Efficient KNN on GPU through Reconciliation between Redundancy Removal and Regularity

2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), pp. 621–632.

By: G. Chen n, Y. Ding n & X. Shen n

TL;DR: This work gives a detailed study on how to effectively combine the strengths of both approaches to create a new KNN on GPU named Sweet KNN, the first high-performance triangular-inequality-based Knn on GPU that manages to reach a sweet point between redundancy minimization and regularity preservation for various datasets. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Sources: Web Of Science, ORCID
Added: August 6, 2018

2017 journal article

Understanding co-run performance on CPU-GPU integrated processors: observations, insights, directions

Frontiers of Computer Science, 11(1), 130–146.

author keywords: performance analysis; GPGPU; co-run degradation; fused processor; program transformation
TL;DR: This work investigates the performance implications of independently co-running CPU and GPU programs on these platforms, and produces a list of novel insights, including the important roles of operating system (OS) context switching and power management in determining the program performance. (via Semantic Scholar)
UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Sources: Web Of Science, ORCID, Crossref
Added: August 6, 2018

2017 conference paper

Versapipe

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-50 '17. Presented at the the 50th Annual IEEE/ACM International Symposium.

Event: the 50th Annual IEEE/ACM International Symposium

author keywords: GPU; Pipelined Computing
TL;DR: This paper proposes three new execution models equipped with much improved controllability, including a hybrid model that is capable of getting the strengths of all, and leads to the development of a software programming framework named VersaPipe. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: January 25, 2020

2016 report

A Software Framework for Efficient Preemptive Scheduling on GPU

(Technical Report No. TR-2016-1). North Carolina State University.

By: G. Chen, X. Shen & H. Zhou

Source: NC State University Libraries
Added: January 29, 2021

2016 conference paper

Coherence-Free Multiview

Proceedings of the 2016 International Conference on Supercomputing - ICS '16. Presented at the the 2016 International Conference.

By: G. Chen n & X. Shen n

Event: the 2016 International Conference

Sources: Crossref, ORCID
Added: September 5, 2020

2016 journal article

Examining and Reducing the Influence of Sampling Errors on Feedback-Driven Optimizations

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 13(1).

By: M. Zhou, B. Wu*, X. Shen n, Y. Gao* & G. Yiu*

author keywords: Compiler; Profiling; Feedback-Driven Optimization (FDO); Performance; Input Sensitivity; Performance; influence of sampling errors; feedback-driven optimization
TL;DR: This article gives the first systematic study in how sampling rates affect the accuracy of collected profiles and how the accuracy correlates with the usefulness of the profile for modern FDO. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: August 6, 2018

2016 report

LCD: A Fast Contrastive Divergence Based Training Algorithm for Restricted Boltzmann Machine”

(No. TR-2016-3). Raleigh, NC: North Carolina State University.

By: L. Ning & X. Shen

Source: NC State University Libraries
Added: February 20, 2021

2016 book

Languages and Compilers for Parallel Computing

In Lecture Notes in Computer Science.

Xipeng Shen

Ed(s): X. Shen, F. Mueller & J. Tuck

TL;DR: The InfiniMem framework is introduced that enables size oblivious processing of large collections of objects that do not fit in memory by making them disk-resident, and is demonstrated with 3 different probabilistic analytics algorithms, 3 different graph processing size oblivious frameworks. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: December 7, 2020

2016 article

OpenCL-based erasure coding on heterogeneous architectures

Chen, G., Zhou, H., Shen, X., Gahm, J., Venkat, N., Booth, S., & Marshall, J. (2016, July). 2016 IEEE 27th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), Vol. 7, pp. 33–40.

By: G. Chen n, H. Zhou n, X. Shen n, J. Gahm*, N. Venkat*, S. Booth*, J. Marshall*

TL;DR: This work exploits state-of-art heterogeneous architectures, including GPUs, APUs, and FPGAs, to accelerate erasure coding using the OpenCL framework and proposes code optimizations for each target architecture given their different hardware characteristics. (via Semantic Scholar)
Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2016 conference paper

Towards Ontology-Based Program Analysis

In S. Krishnamurthi & B. S. Lerner (Eds.), 30th European Conference on Object-Oriented Programming (ECOOP 2016) (pp. 26:1–26:25). Dagstuhl, Germany: Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik.

By: Y. Zhao, G. Chen, C. Liao & X. Shen

Ed(s): S. Krishnamurthi & B. Lerner

Source: NC State University Libraries
Added: January 29, 2021

2016 report

Towards Ontology-Based Program Analysis

(Technical Report No. TR-2016-5). North Carolina State University.

By: Y. Zhao, C. Liao & X. Shen

Source: NC State University Libraries
Added: June 17, 2022

2016 journal article

Tuning for software analytics: Is it really necessary?

Information and Software Technology, 76, 135–146.

By: W. Fu n, T. Menzies n & X. Shen n

author keywords: Defect prediction; CART; Random forest; Differential evolution; Search-based software engineering
TL;DR: This paper finds that it is no longer enough to just run a data miner and present the result without conducting a tuning optimization study, and that standard methods in software analytics need to change. (via Semantic Scholar)
Sources: Web Of Science, ORCID, Crossref
Added: August 6, 2018

2015 article

Autotuning Algorithmic Choice for Input Sensitivity

Ding, Y., Ansel, J., Veeramachaneni, K., Shen, X., O'Reilly, U.-M., & Amarasinghe, S. (2015, June). ACM SIGPLAN NOTICES, Vol. 50, pp. 379–390.

By: Y. Ding n, J. Ansel*, K. Veeramachaneni*, X. Shen n, U. O'Reilly* & S. Amarasinghe*

author keywords: Algorithms; Languages; Performance; Petabricks; Autotuning; Algorithmic Optimization; Input Adaptive; Input Sensitivity; Two-level Input Learning
Sources: Web Of Science, ORCID
Added: August 6, 2018

2015 journal article

Enabling Portable Optimizations of Data Placement on GPU

IEEE Micro, 35(4), 16–24.

By: G. Chen n, B. Wu*, D. Li* & X. Shen n

TL;DR: Porple offers a solution that, for the first time, makes it possible to automatically enhance data placement across a GPU, and shows that Porple consistently finds optimal or near-optimal placement, yielding up to 2.93 times speedups compared to programmers' decisions. (via Semantic Scholar)
Sources: Web Of Science, ORCID, Crossref
Added: August 6, 2018

2015 conference paper

Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations

Proceedings of the 29th ACM on International Conference on Supercomputing - ICS '15. Presented at the the 29th ACM.

By: B. Wu*, G. Chen n, D. Li*, X. Shen n & J. Vetter*

Event: the 29th ACM

author keywords: GPGPU; Scheduling; Compiler Transformation; Data Affinity; Program Co-Run
TL;DR: It is shown that some simple optimization techniques can enhance co-runs of multiple kernels and improve data locality of irregular applications, producing 20-33% average increase in performance, system throughput, and average turnaround time. (via Semantic Scholar)
UN Sustainable Development Goal Categories
8. Decent Work and Economic Growth (OpenAlex)
Sources: Crossref, ORCID
Added: September 5, 2020

2015 conference paper

Free launch

Proceedings of the 48th International Symposium on Microarchitecture - MICRO-48. Presented at the the 48th International Symposium.

By: G. Chen n & X. Shen n

Event: the 48th International Symposium

author keywords: GPU; Dynamic Parallelism; Optimization; Thread Reuse Compiler; Runtime Adaptation
TL;DR: This work proposes free launch, a new software approach to overcoming the shortcomings of both methods for exploiting dynamic parallelism on GPU, which employs a novel compiler-based code transformation named subkernel launch removal to replace the subkernel launches with the reuse of parent threads. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 5, 2020

2015 article

On-the-Fly Principled Speculation for FSM Parallelization

Zhao, Z., & Shen, X. (2015, April). ACM SIGPLAN NOTICES, Vol. 50, pp. 619–630.

By: Z. Zhao* & X. Shen n

author keywords: Languages; Performance; Finite State Machine; FSM; DFA; Speculative Parallelization; Multicore; Online Profiling
Sources: Web Of Science, ORCID
Added: August 6, 2018

2015 conference paper

TOP: A Framework for Enabling Algorithmic Optimizations for Distance-Related Problems

In C. Li & V. Markl (Eds.), 41st International Conference on Very Large Data Bases (VLDB 2015) : proceedings of the VLDB Endowment, volume 8, number 1-13, Kohala Coast, Hawaii, USA, 31 August-4 September 2015. Stanford, CA: VLDB Endowment.

By: Y. Ding, X. Shen, M. Musuvathi & T. Mytkowicz

Ed(s): C. Li & V. Markl

Event: 41st International Conference on Very Large Data Bases at Kohala Coast, Hawaii

Source: NC State University Libraries
Added: January 30, 2021

2015 report

TOP: A Framework for Enabling Algorithmic Optimizations for Distance-Related Problems”

(Technical Report No. TR-2015-3). North Carolina State University.

By: Y. Ding, X. Shen, M. Musuvathi & T. Mytkowicz

Source: NC State University Libraries
Added: January 30, 2021

2015 chapter

Understanding Co-run Degradations on Integrated Heterogeneous Processors

In Languages and Compilers for Parallel Computing (pp. 82–97).

author keywords: Heterogeneous architecture; Performance analysis; CPU and memory contention; Optimization; GPGPU
TL;DR: Co-runs of independent applications on systems with heterogeneous processors are common and limited understanding on the influence of co-runners on such systems is limited. (via Semantic Scholar)
Source: Crossref
Added: September 4, 2020

2015 conference paper

Understanding co-run degradations on integrated heterogeneous processors

Languages and compilers for parallel computing (lcpc 2014), 8967, 82–97.

By: Q. Zhu, B. Wu, X. Shen, L. Shen & Z. Wang

Source: NC State University Libraries
Added: August 6, 2018

2015 conference paper

Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup

Proceedings of the 32nd International Conference on Machine Learning, 37, 579–587. Lille, France.

By: Y. Ding, Y. Zhao, X. Shen, M. Musuvathi & T. Mytkowicz

Event: The 32nd International Conference on Machine Learning at Lille, France on July 6-11, 2015

Source: NC State University Libraries
Added: January 30, 2021

2015 report

Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup

(Technical Report No. TR-2015-2). North Carolina State University.

By: Y. Ding, X. Shen, M. Musuvathi & T. Mytkowicz

Source: NC State University Libraries
Added: January 30, 2021

2014 article

Call Sequence Prediction through Probabilistic Calling Automata

Zhao, Z., Wu, B., Zhou, M., Ding, Y., Sun, J., Shen, X., & Wu, Y. (2014, October). ACM SIGPLAN NOTICES, Vol. 49, pp. 745–762.

By: Z. Zhao*, B. Wu*, M. Zhou*, Y. Ding n, J. Sun*, X. Shen n, Y. Wu*

author keywords: Languages; Performance; Function call; Call sequence prediction; Probabilistic calling automata; Dynamic optimizations; Just-in-time compilation; Parallel compilation
TL;DR: A new way to enable call sequence prediction is presented, which exploits program structures through Probabilistic Calling Automata (PCA), a new program representation that captures both the inherent ensuing relations among function calls, and the probabilistic nature of execution paths. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: August 6, 2018

2014 conference paper

Challenging the "embarrassingly sequential"

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems - ASPLOS '14. Presented at the the 19th international conference.

By: Z. Zhao*, B. Wu* & X. Shen*

Event: the 19th international conference

author keywords: Languages; Performance; FSM; Speculative Parallelization; Lookback; DFA; Multicore; Partial Commit
TL;DR: This paper offers the first disciplined way to exploit application-specific information to inform speculations for parallelization, and presents a probabilistic model that captures the relations between speculative executions and the properties of the target FSM and its inputs. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 5, 2020

2014 conference paper

Finding the limit

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems - ASPLOS '14. Presented at the the 19th international conference.

By: Y. Ding*, M. Zhou*, Z. Zhao*, S. Eisenstat* & X. Shen*

Event: the 19th international conference

author keywords: Performance; JIT; Compilation Scheduling; Compilation Order; NP-completeness; Heuristic Algorithm; Runtime System
TL;DR: This study proves the strong NP-completeness of the problem, proposes a heuristic algorithm that yields near optimal schedules, examines the potential of two current scheduling schemes empirically, and explores the relations with JIT designs. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 5, 2020

2014 conference paper

Localization of concurrency bugs using shared memory access pairs

Proceedings of the 29th ACM/IEEE international conference on Automated software engineering - ASE '14. Presented at the the 29th ACM/IEEE international conference.

Event: the 29th ACM/IEEE international conference

TL;DR: Evaluation results on 16 common concurrency bugs show that all buggy shared memory accesses that trigger these bugs can be precisely localized by LOCON with only one failed run captured. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 5, 2020

2014 article

PORPLE: An Extensible Optimizer for Portable Data Placement on GPU

2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), pp. 88–100.

By: G. Chen n, B. Wu*, D. Li* & X. Shen n

author keywords: GPU; cache; compiler; data placement; hardware specification language
TL;DR: Experiments show that PORPLE is able to consistently find optimal or near-optimal placement despite the largedifferences among GPU architectures and program inputs, yielding up to 2.08X speedups on a set of regular and irregularGPU benchmarks. (via Semantic Scholar)
Sources: Web Of Science, ORCID
Added: August 6, 2018

2014 conference paper

SatScore

Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing - UbiComp '14 Adjunct. Presented at the the 2014 ACM International Joint Conference.

By: Z. Zhao*, M. Zhou* & X. Shen n

Event: the 2014 ACM International Joint Conference

author keywords: Smartphone; Launch responsiveness; User study; Measurement Pitfalls
TL;DR: This paper presents the promise of solving the dilemma through a Sat score model and demonstrates some new opportunities for responsiveness enhancement enabled by the SatScore model. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 5, 2020

2014 journal article

Space-efficient multi-versioning for input-adaptive feedback-driven program optimizations

ACM SIGPLAN Notices, 49(10), 763–776.

By: M. Zhou*, X. Shen n, Y. Gao* & G. Yiu*

TL;DR: This study proves selecting the best set of versions under a space constraint is NP-complete and proposes a heuristic algorithm named CHoGS which yields near optimal results in quadratic time. (via Semantic Scholar)
Sources: NC State University Libraries, ORCID
Added: August 6, 2018

2013 journal article

Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU

ACM SIGPLAN Notices, 48(8), 57.

By: B. Wu*, Z. Zhao*, E. Zhang*, Y. Jiang* & X. Shen*

author keywords: Performance; Experimentation; GPGPU; Memory coalescing; Computational complexity; Thread-data remapping; Runtime optimizations; Data transformation
Sources: Crossref, ORCID
Added: December 7, 2020

2013 conference paper

Exploring Hybrid Memory for GPU Energy Efficiency through Software-Hardware Co-Design

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. Presented at the PACT, Edinburgh, Scotland.

By: B. Wang*, B. Wu*, D. Li*, X. Shen*, W. Yu*, Y. Jiao*, J. Vetter*

Event: PACT at Edinburgh, Scotland on September 7-11, 2013

TL;DR: The co-design approach helps tap into the full potential of hybrid memory for GPU without requiring dramatic hardware changes over previous designs, yielding 6% and 49% energy saving on average compared to pure DRAM and pure PCM respectively, and keeping performance loss less than 2%. (via Semantic Scholar)
UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Sources: ORCID, NC State University Libraries
Added: December 31, 2019

2013 chapter

Fine-Grained Treatment to Synchronizations in GPU-to-CPU Translation

In Languages and Compilers for Parallel Computing (pp. 171–184).

By: Z. Guo* & X. Shen*

TL;DR: It is shown that careful dependence analysis may allow a fine-grained treatment to synchronizations and reveal redundant computation at the instruction-instance level and compared to existing translations, the new approach can yield speedup of a factor of integers. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 10, 2020

2013 journal article

HPar

ACM Transactions on Architecture and Code Optimization, 10(4), 1–25.

By: Z. Zhao*, M. Bebenita*, D. Herman*, J. Sun* & X. Shen*

TL;DR: This work develops, to the best of the authors' knowledge, the first pipelining and data-level parallel HTML parsers and demonstrates the feasibility of efficient, parallel HTML parsing for the first time and offers a set of novel insights for parallel HTML parse. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 6, 2020

2013 chapter

Optimal Co-Scheduling to Minimize Makespan on Chip Multiprocessors

In Job Scheduling Strategies for Parallel Processing (pp. 114–133).

By: K. Tian*, Y. Jiang*, X. Shen* & W. Mao*

UN Sustainable Development Goal Categories
8. Decent Work and Economic Growth (OpenAlex)
Sources: Crossref, ORCID
Added: September 10, 2020

2013 article

Profmig: A framework for flexible migration of program profiles across software versions

Zhou, M., Wu, B., Ding, Y., & Shen, X. (2013, February). Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

By: M. Zhou*, B. Wu*, Y. Ding* & X. Shen*

TL;DR: This paper begins a systematic exploration in cross-version program profile migration, which tries to effectively reuse the valid part of the behavior profiles of an old version of a software for a new version, and introduces ProfMig, a framework for flexible migrations of various profiles. (via Semantic Scholar)
UN Sustainable Development Goal Categories
10. Reduced Inequalities (OpenAlex)
Source: ORCID
Added: December 31, 2019

2013 chapter

Simple Profile Rectifications Go a Long Way

In ECOOP 2013 – Object-Oriented Programming (pp. 654–678).

By: B. Wu*, M. Zhou*, X. Shen*, Y. Gao*, R. Silvera* & G. Yiu*

TL;DR: Experiments show that the simple approach enhances the effectiveness of sampled profile-based FDO dramatically, increasing the average FDO speedup from 1.16X to 1.3X, around 92% of what full profiles can yield. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 5, 2020

2012 journal article

An Infrastructure for Tackling Input-Sensitivity of GPU Program Optimizations

International Journal of Parallel Programming, 41(6), 855–869.

By: X. Shen*, Y. Liu*, E. Zhang* & P. Bhamidipati*

author keywords: GPU; Program Optimizations; Empirical Search; CUDA; G-ADAPT; Cross-input Adaptation
TL;DR: G-ADAPT+ is a framework to address the influence of program inputs on GPU program performance by constructing cross-input predictive models for automatically predicting the (near-)optimal configurations for an arbitrary input to a GPU program. (via Semantic Scholar)
UN Sustainable Development Goal Categories
9. Industry, Innovation and Infrastructure (OpenAlex)
Sources: Crossref, ORCID
Added: December 7, 2020

2012 conference paper

Exploiting inter-sequence correlations for program behavior prediction

Proceedings of the ACM international conference on Object oriented programming systems languages and applications - OOPSLA '12. Presented at the the ACM international conference.

By: B. Wu*, Z. Zhao*, X. Shen*, Y. Jiang*, Y. Gao* & R. Silvera*

Event: the ACM international conference

TL;DR: This paper revisits the design philosophy and systematically explore a second source of clues: statistical correlations between the behavior sequences of different program entities, creating the first taxonomy of program behavior sequence patterns. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 5, 2020

2012 conference paper

One stone two birds

Proceedings of the 26th ACM international conference on Supercomputing - ICS '12. Presented at the the 26th ACM international conference.

By: Z. Guo*, B. Wu* & X. Shen*

Event: the 26th ACM international conference

TL;DR: A thread-level dependence analysis is presented, which leads to a code generator with three novel features: an instance-level instruction scheduler for synchronization relaxation, a graph pattern recognition scheme for code shape optimization, and a fine-grained analysis forthread-level partial redundancy removal. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 5, 2020

2012 journal article

The Significance of CMP Cache Sharing on Contemporary Multithreaded Applications

IEEE Transactions on Parallel and Distributed Systems, 23(2), 367–374.

By: E. Zhang*, Y. Jiang* & X. Shen*

author keywords: Shared cache; thread scheduling; parallel program optimizations; chip multiprocessors
TL;DR: A systematic measurement of the influence of cache sharing on modern Chip Multiprocessors finds that the main reason is the mismatch between the software design (and compilation) of multithreaded applications and CMP architectures. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 6, 2020

2011 conference paper

A step towards transparent integration of input-consciousness into dynamic program optimizations

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications - OOPSLA '11. Presented at the the 2011 ACM international conference.

By: K. Tian*, E. Zhang* & X. Shen*

Event: the 2011 ACM international conference

TL;DR: Experiments on a number of Java programs demonstrate the effectiveness of the techniques in enabling input-consciousness for dynamic optimizations, revealing the feasibility and potential benefits of the new optimization paradigm in some basic settings. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 5, 2020

2011 chapter

Array Regrouping on CMP with Non-uniform Cache Sharing

In Languages and Compilers for Parallel Computing (pp. 92–105).

By: Y. Jiang*, E. Zhang*, X. Shen*, Y. Gao* & R. Archambault*

TL;DR: This work proposes cache-sharing-aware reference affinity analysis for identifying data affinity in multithreading applications that consists of affinity-guided thread scheduling and hierarchical reference-vector merging, handles cache sharing among both hyperthreads and cores, and offers hints for array regrouping and the avoidance of false sharing. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 10, 2020

2011 conference paper

Correctly Treating Synchronizations in Compiling Fine-Grained SPMD-Threaded Programs for CPU

2011 International Conference on Parallel Architectures and Compilation Techniques. Presented at the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT).

By: Z. Guo*, E. Zhang* & X. Shen*

Event: 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

TL;DR: A systematic dependence analysis specially designed for handling implicit synchronizations in SPMD-threaded programs is described, unveiling the relations between inter-thread data dependences and correct treatment to synchronizations and presents a dependence-based solution to the problem. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 5, 2020

2011 conference paper

Enhancing Data Locality for Dynamic Simulations through Asynchronous Data Transformations and Adaptive Control

2011 International Conference on Parallel Architectures and Compilation Techniques. Presented at the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT).

By: B. Wu*, E. Zhang* & X. Shen*

Event: 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

TL;DR: This work examines the implications that modern heterogeneous Chip Multiprocessors (CMP) architecture imposes on the optimization paradigm, and develops three techniques to enhance the optimizations, including an asynchronous data transformation algorithm, named TLayout, designed specially to take advantage of modern throughput-oriented processors. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 5, 2020

2011 conference paper

On-the-fly elimination of dynamic irregularities for GPU computing

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems - ASPLOS '11. Presented at the the sixteenth international conference.

By: E. Zhang*, Y. Jiang*, Z. Guo*, K. Tian* & X. Shen*

Event: the sixteenth international conference

TL;DR: G-Streamline is presented, as a unified software solution to dynamic irregularities in GPU computing, which treats both types of irregularities at the same time in a holistic fashion, maximizing the whole-program performance by resolving conflicts among optimizations. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 5, 2020

2011 journal article

The Complexity of Optimal Job Co-Scheduling on Chip Multiprocessors and Heuristics-Based Solutions

IEEE Transactions on Parallel and Distributed Systems, 22(7), 1192–1205.

By: Y. Jiang*, K. Tian*, X. Shen*, J. Zhang n, J. Chen* & R. Tripathi*

author keywords: Co-scheduling; shared cache; CMP scheduling; cache contention; perfect matching; integer programming
TL;DR: The paper uncovers the computational complexity of the determination of optimal job co-schedules, proving its NP-completeness and introduces a set of algorithms, based on graph theory and Integer/Linear Programming, for computing optimal co- schedules or their lower bounds in scenarios with or without job migrations. (via Semantic Scholar)
UN Sustainable Development Goal Categories
8. Decent Work and Economic Growth (OpenAlex)
Sources: Crossref, ORCID
Added: September 6, 2020

2010 conference paper

An input-centric paradigm for program dynamic optimizations

Proceedings of the ACM international conference on Object oriented programming systems languages and applications - OOPSLA '10. Presented at the the ACM international conference.

By: K. Tian*, Y. Jiang*, E. Zhang* & X. Shen*

Event: the ACM international conference

Sources: Crossref, ORCID
Added: September 5, 2020

2010 chapter

Combining Locality Analysis with Online Proactive Job Co-scheduling in Chip Multiprocessors

In High Performance Embedded Architectures and Compilers (pp. 201–215).

By: Y. Jiang*, K. Tian* & X. Shen*

TL;DR: A lightweight locality model is developed that enables efficient, proactive prediction of the performance of co-running processes, offering the potential for an integration in online scheduling systems. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 5, 2020

2010 conference paper

Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?

Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '10. Presented at the the 15th ACM SIGPLAN symposium.

By: E. Zhang*, Y. Jiang* & X. Shen*

Event: the 15th ACM SIGPLAN symposium

author keywords: Shared Cache; Thread Scheduling; Parallel Program Optimizations; Chip Multiprocessors
TL;DR: A systematic measurement of the influence of CMP cache sharing on two kinds of commodity CMP machines, using a recently released CMP benchmark suite, PARSEC, with a number of potentially important factors on program, OS, and architecture levels considered shows some surprising results. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 5, 2020

2010 report

Experiences in Porting the Hubbard Model in Computational Materials Science to GPU

(Technical Report No. WM-CS-2010-04). Computer Science Department, The College of William and Mary.

By: C. Albert, A. Paloski, X. Shen, E. Walter & S. Zhang

Source: NC State University Libraries
Added: January 30, 2021

2010 conference paper

Exploiting statistical correlations for proactive prediction of program behaviors

Proceedings of the 8th annual IEEE/ ACM international symposium on Code generation and optimization - CGO '10. Presented at the the 8th annual IEEE/ ACM international symposium.

By: Y. Jiang*, E. Zhang*, K. Tian*, F. Mao*, M. Gethers*, X. Shen*, Y. Gao*

Event: the 8th annual IEEE/ ACM international symposium

TL;DR: A regression based framework is proposed to automatically identify a small set of behaviors that can lead to accurate prediction of other behaviors in a program, called seminal behaviors, and constructs predictive models that map from seminal behaviors to other behaviors, enabling proactive and cross-input adaptive prediction of program behaviors. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 5, 2020

2010 report

Implementing the Dslash Operator in OpenCL

(Technical Report No. WM-CS-2010-03). Computer Science Department, The College of William and Mary.

By: A. Kowalski & X. Shen

Source: NC State University Libraries
Added: January 30, 2021

2010 chapter

Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors?

In Lecture Notes in Computer Science (pp. 264–282).

By: Y. Jiang*, E. Zhang*, K. Tian* & X. Shen*

TL;DR: The concept of concurrent reuse distance is introduced, a direct extension of the traditional concept of reuse distance with data references by all co-running threads (or jobs) considered, and the special challenges facing the collection and application of concurrent reused distance on CMP platforms are revealed. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 5, 2020

2010 chapter

LU Decomposition on Cell Broadband Engine: An Empirical Study to Exploit Heterogeneous Chip Multiprocessors

In Lecture Notes in Computer Science (pp. 61–75).

By: F. Mao* & X. Shen*

TL;DR: This paper presents the experience on optimizing LU decomposition, one of the commonly used algebra kernels in scientific computing, on Cell Broadband Engine, and offers some insights in the optimizations on heterogenous multi-core processors. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 5, 2020

2010 conference paper

Streamlining GPU applications on the fly

Proceedings of the 24th ACM International Conference on Supercomputing - ICS '10. Presented at the the 24th ACM International Conference.

By: E. Zhang*, Y. Jiang*, Z. Guo* & X. Shen*

Event: the 24th ACM International Conference

TL;DR: A systematic investigation in the employment of runtime thread-data remapping for solving the problem of thread divergences on runtime values is presented, and a solution is offered by proposing a CPU-GPU pipelining scheme and a label-assign-move (LAM) algorithm to virtually hide all the remapping overhead. (via Semantic Scholar)
UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Sources: Crossref, ORCID
Added: September 5, 2020

2009 report

A Systematic Measurement of the Influence of Non-Uniform Cache Sharing on the Performance of Modern Multithreaded Programs

(Technical Report No. WM-CS-2009-04). Computer Science Department, The College of William and Mary.

By: E. Zhang, Y. Jiang & X. Shen

Source: NC State University Libraries
Added: January 30, 2021

2009 conference paper

A cross-input adaptive framework for GPU program optimizations

2009 IEEE International Symposium on Parallel & Distributed Processing. Presented at the Distributed Processing (IPDPS).

By: Y. Liu*, E. Zhang* & X. Shen*

Event: Distributed Processing (IPDPS)

TL;DR: An input-adaptive optimization framework, namely G-ADAPT, is developed to address the influence of program inputs by constructing cross-input predictive models for automatically predicting the (near-)optimal configurations for an arbitrary input to a GPU program. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: December 7, 2020

2009 conference paper

A study on optimally co-scheduling jobs of different lengths on chip multiprocessors

Proceedings of the 6th ACM conference on Computing frontiers - CF '09. Presented at the the 6th ACM conference.

By: K. Tian*, Y. Jiang* & X. Shen*

Event: the 6th ACM conference

TL;DR: This work proposes an A*-based approach to accelerating the search for optimal schedules by as much as several orders of magnitude, and designs and evaluates two approximation algorithms to effectively approximate the optimal schedules with good accuracy and high scalability. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: December 7, 2020

2009 report

Co-Run Locality Prediction for Proactive Shared-Cache Management

(Technical Report No. WM-CS-2009-03). Computer Science Department, The College of William and Mary.

By: X. Shen & Y. Jiang

Source: NC State University Libraries
Added: January 30, 2021

2009 conference paper

Cross-Input Learning and Discriminative Prediction in Evolvable Virtual Machines

2009 International Symposium on Code Generation and Optimization. Presented at the 2009 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

By: F. Mao* & X. Shen*

Event: 2009 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

author keywords: Cross-Input Learning; Java Virtual Machine; Evolvable Computing; Adaptive Optimization; Input-Centric Optimization; Discriminative Prediction
TL;DR: This work develops a set of techniques that make a virtual machine evolve across production runs, and employs an enriched extensible specification language to resolve the complexities in program inputs. (via Semantic Scholar)
UN Sustainable Development Goal Categories
10. Reduced Inequalities (OpenAlex)
Sources: Crossref, ORCID
Added: September 6, 2020

2009 conference paper

Influence of program inputs on the selection of garbage collectors

Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments - VEE '09. Presented at the the 2009 ACM SIGPLAN/SIGOPS international conference.

By: F. Mao*, E. Zhang* & X. Shen*

Event: the 2009 ACM SIGPLAN/SIGOPS international conference

TL;DR: The predictability of the minimum possible heap size is demonstrated, indicating the potential feasibility of the input-specific selection of garbage collectors. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: January 5, 2021

2009 report

Program Seminal Behaviors: Automating Input Characterization for Large-Scope Proactive Behavior Prediction

(Technical Report No. WM-CS-2009-07). Computer Science Department, The College of William and Mary.

By: X. Shen, Y. Jiang, E. Zhang, K. Tan, F. Mao & M. Gethers

Source: NC State University Libraries
Added: January 30, 2021

2009 journal article

Program locality analysis using reuse distance

ACM Transactions on Programming Languages and Systems, 31(6), 1–39.

By: Y. Zhong*, X. Shen* & C. Ding*

author keywords: Measurement; Languages; Algorithms; Program locality; reuse distance; stack distance; training-based analysis
TL;DR: Two techniques are presented, among the first to enable quantitative analysis of whole-program locality in general sequential code, that predict how the locality of a program changes with its input. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 6, 2020

2009 report

Speculation with Little Wasting: Saving Cost in Software Speculation Through Transparent Learning

(No. WM-CS-2009-08). Williamsburg, VA: Computer Science Department, The College of William and Mary.

By: Y. Jiang & X. Shen

Source: NC State University Libraries
Added: February 20, 2021

2009 conference paper

Speculation with Little Wasting: Saving Cost in Software Speculation through Transparent Learning

2009 15th International Conference on Parallel and Distributed Systems. Presented at the 2009 15th International Conference on Parallel and Distributed Systems.

By: Y. Jiang*, F. Mao* & X. Shen*

Event: 2009 15th International Conference on Parallel and Distributed Systems

TL;DR: Transparent statistical learning is proposed to make speculation cross-input adaptive by learning across iterations and executions, permitting arbitrary depth of speculations, applicable to both loop-level and function-level parallelism. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 6, 2020

2009 report

Streamlining GPU Applications On the Fly – Thread Divergence Elimination through Runtime Thread-Data Remapping

(No. WM-CS-2009-08). Williamsburg, VA: Computer Science Department, The College of William and Mary.

By: E. Zhang, Y. Jiang, Z. Guo & X. Shen

Source: NC State University Libraries
Added: February 20, 2021

2009 journal article

The study and handling of program inputs in the selection of garbage collectors

ACM SIGOPS Operating Systems Review, 43(3), 48.

By: X. Shen*, F. Mao*, K. Tian* & E. Zhang*

TL;DR: Experimental results demonstrate that with regression and classification techniques, it is possible to predict the best garbage collector (along with the minimum possible heap size) with reasonable accuracy given an arbitrary input to an application. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 6, 2020

2008 report

A Cross-Input Adaptive Framework for GPU Program Optimization

(No. WM-CS-2008-09). Williamsburg, VA: Computer Science Department, The College of William and Mary.

By: Y. Liu, E. Zhang & X. Shen

Source: NC State University Libraries
Added: February 20, 2021

2008 conference paper

Adaptive Software Speculation for Enhancing the Cost-Efficiency of Behavior-Oriented Parallelization

2008 37th International Conference on Parallel Processing. Presented at the 2008 37th International Conference on Parallel Processing (ICPP).

By: Y. Jiang* & X. Shen*

Event: 2008 37th International Conference on Parallel Processing (ICPP)

Sources: Crossref, ORCID
Added: January 5, 2021

2008 article

Adaptive speculation in behavior-oriented parallelization

Jiang, Y., & Shen, X. (2008, April). 2008 IEEE International Symposium on Parallel and Distributed Processing.

By: Y. Jiang* & X. Shen*

TL;DR: Adaptive speculation is proposed to predict the profitability of a speculation and dynamically enable or disable the speculation of a region and enhance the usability of behavior-oriented parallelization by allowing users to label potential parallel regions more flexibly. (via Semantic Scholar)
Source: ORCID
Added: December 31, 2019

2008 conference paper

Analysis and approximation of optimal co-scheduling on chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08. Presented at the the 17th international conference.

By: Y. Jiang*, X. Shen*, J. Chen* & R. Tripathi*

Event: the 17th international conference

author keywords: co-scheduling; CMP scheduling; cache contention; perfect matching
TL;DR: This paper presents a theoretical analysis of the complexity of co-scheduling, proving its NP-completeness and designs and evaluates a sequence of approximation algorithms, among which, the hierarchical matching algorithm produces near-optimal schedules and shows good scalability. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: January 5, 2021

2008 report

Cross-Input Learning and Discriminative Prediction in Evolvable Virtual Machines

(No. WM-CS-2008-06). Williamsburg, VA: Computer Science Department, The College of William and Mary.

By: F. Mao & X. Shen

Source: NC State University Libraries
Added: February 20, 2021

2008 chapter

Exploration of the Influence of Program Inputs on CMP Co-scheduling

In Lecture Notes in Computer Science (pp. 263–273).

By: Y. Jiang* & X. Shen*

TL;DR: It is shown that the ability to adapt to program inputs is important for a co-scheduler to work effectively on Chip Multiprocessors and the potential of the predictive models in guiding contention-aware co- scheduling is demonstrated. (via Semantic Scholar)
UN Sustainable Development Goal Categories
8. Decent Work and Economic Growth (OpenAlex)
Sources: Crossref, ORCID
Added: September 6, 2020

2008 report

LU Decomposition on Cell Broadband Engine

(Technical Report No. WM-CS-2008-08). Computer Science Department, The College of William and Mary.

By: F. Mao & X. Shen

Source: NC State University Libraries
Added: January 30, 2021

2008 chapter

Scalable Implementation of Efficient Locality Approximation

In Languages and Compilers for Parallel Computing (pp. 202–216).

By: X. Shen* & J. Shaw

TL;DR: An algorithm that approximates reuse distance on arbitrary scales is described; a portable scheme that employs memory controller to accelerate the measure of time distance is explained; and the algorithm and proof of a trace generator that can facilitate various locality studies are uncovered. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 10, 2020

2007 report

A Hybrid Framework Bridging Locality Analysis and Cache-Aware Scheduling for CMPs

(Technical Report No. WM-CS-2007-01). Computer Science Dept., The College of William and Mary.

By: X. Shen

Source: NC State University Libraries
Added: January 30, 2021

2007 report

CAPS: Contention-Aware Proactive Scheduling for CMPs

(Technical Report No. WM-CS-2007-09). Computer Science Department, The College of William and Mary.

By: X. Shen, Y. Jiang & F. Mao

Source: NC State University Libraries
Added: January 30, 2021

2007 conference paper

Locality approximation using time

Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '07. Presented at the the 34th annual ACM SIGPLAN-SIGACT symposium.

By: X. Shen*, J. Shaw, B. Meeker* & C. Ding*

Event: the 34th annual ACM SIGPLAN-SIGACT symposium

TL;DR: This work proposes a statistical model that converts cheaply obtained time distance to the more costly reuse distance, and reduces measuring time by a factor of 17, and approximates cache line reuses with over 99% accuracy and the cache miss rate with less than 0.4% average error. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 6, 2020

2007 journal article

Miss Rate Prediction Across Program Inputs and Cache Configurations

IEEE Transactions on Computers, 56(3), 328–343.

By: Y. Zhong*, S. Dropsho*, X. Shen*, A. Studer* & C. Ding*

author keywords: cache memories; modeling techniques; performance analysis and design aids; compilers; optimization
TL;DR: An interactive visualization tool that uses a three-dimensional plot to show miss rate changes across program data sizes and cache sizes and its use in evaluating compiler transformations and other uses of this visualization tool include assisting machine and benchmark-set design. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 6, 2020

2007 report

Modeling Relations Between Inputs and Dynamic Behavior for General Programs

(No. WM-CS-2007-07). Williamsburg, VA: Computer Science Department, The College of William and Mary.

By: X. Shen & F. Mao

Source: NC State University Libraries
Added: February 20, 2021

2007 journal article

Predicting locality phases for dynamic memory optimization

Journal of Parallel and Distributed Computing, 67(7), 783–796.

By: X. Shen*, Y. Zhong* & C. Ding*

author keywords: program phase prediction; phase hierarchy; locality analysis and optimization; reconfigurable architecture; dynamic optimization
TL;DR: The accuracy and the granularity of phase and phase-sequence prediction as well as its uses in dynamic data packing, memory remapping, and cache resizing are shown. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 6, 2020

2007 conference paper

Software behavior oriented parallelization

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation - PLDI '07. Presented at the the 2007 ACM SIGPLAN conference.

By: C. Ding*, X. Shen*, K. Kelsey*, C. Tice*, R. Huang* & C. Zhang*

Event: the 2007 ACM SIGPLAN conference

author keywords: speculative parallelization; program behavior
TL;DR: The main goal of the paper is to demonstrate that the general protection can be made cost effective by three novel techniques: programmable speculation, critical-path minimization, and value-based correctness checking. (via Semantic Scholar)
UN Sustainable Development Goal Categories
4. Quality Education (OpenAlex)
Sources: Crossref, ORCID
Added: September 6, 2020

2007 report

Study of the Effects of Program Inputs on Co-Scheduling

(Technical Report No. WM-CS-2007-13). Computer Science Department, The College of William and Mary.

By: Y. Jiang & X. Shen

Source: NC State University Libraries
Added: January 30, 2021

2006 report

A Key-Based Adaptive Transactional Memory Executor

(No. TR909). Rochester, NY: Computer Science Dept., University of Rochester.

By: T. Bai, X. Shen, C. Zhang, W. Scherer, C. Ding & M. Scott

Source: NC State University Libraries
Added: February 20, 2021

2006 report

Accurate Approximation of Locality from Time Distance Histograms

(Technical Report No. TR902). Computer Science Dept., University of Rochester.

By: X. Shen, J. Shaw & B. Meeker

Source: NC State University Libraries
Added: January 30, 2021

2006 report

Behavior-Oriented Parallelization

(Technical Report No. TR904). Computer Science Dept., University of Rochester.

By: B. Parallelization”, C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, C. Zhang

Source: NC State University Libraries
Added: January 30, 2021

2006 report

Locality Approximation Using Time

(Technical Report No. TR901). Computer Science Dept., University of Rochester.

By: X. Shen, J. Shaw, B. Meeker & C. Ding

Source: NC State University Libraries
Added: January 30, 2021

2006 conference paper

Program-level adaptive memory management

Proceedings of the 2006 international symposium on Memory management - ISMM '06. Presented at the the 2006 international symposium.

By: C. Zhang*, K. Kelsey*, X. Shen*, C. Ding*, M. Hertz* & M. Ogihara*

Event: the 2006 international symposium

TL;DR: This work demonstrates the presence of an optimal heap size for a number of applications and introduces a scheme which adaptively finds this good heap size by adapting itself dynamically, independent of the underlying main memory size, code optimizations, and garbage collection algorithm. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 6, 2020

2006 report

Waste Not, Want Not: Adaptive Garbage Collection in a Shared Environment

(Technical Report No. TR908). Computer Science Dept., University of Rochester.

By: C. Zhang, K. Kelsey, X. Shen, C. Ding, M. Hertz & M. Ogihara

Source: NC State University Libraries
Added: January 30, 2021

2005 conference paper

Gated memory control for memory monitoring, leak detection and garbage collection

Proceedings of the 2005 workshop on Memory system performance - MSP '05. Presented at the the 2005 workshop.

By: C. Ding*, C. Zhang*, X. Shen* & M. Ogihara*

Event: the 2005 workshop

TL;DR: A new approach is described that uses phase boundaries as the gates to monitor and control the memory usage, and uses phase-level patterns to predict the trend of the program's memory demand, identify and control memory leaks, improve the efficiency of garbage collection. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 6, 2020

2005 conference paper

Lightweight reference affinity analysis

Proceedings of the 19th annual international conference on Supercomputing - ICS '05. Presented at the the 19th annual international conference.

By: X. Shen*, Y. Gao*, C. Ding* & R. Archambault*

Event: the 19th annual international conference

TL;DR: This paper implemented a prototype of both the compiler and the profiling analysis in the IBM® compiler, evaluated array regrouping on the entire set of SPEC CPU2000 FORTRAN benchmarks, and compared different analysis methods. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 6, 2020

2005 report

Parallelization of Utility Programs Based on Behavior Phase Analysis

(No. TR876). Rochester, NY: Computer Science Dept., University of Rochester.

By: X. Shen & C. Ding

Source: NC State University Libraries
Added: February 20, 2021

2005 chapter

Phase-Based Miss Rate Prediction Across Program Inputs

In Lecture Notes in Computer Science (pp. 42–55).

By: X. Shen*, Y. Zhong* & C. Ding*

TL;DR: A method that divides a program into phases that have a regular locality pattern and predicts the reuse signature and then the cache miss rate of each phase for all inputs, which is over 98% accurate for a set of floating-point programs. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 10, 2020

2004 chapter

A Hierarchical Model of Reference Affinity

In Languages and Compilers for Parallel Computing (pp. 48–63).

By: Y. Zhong*, X. Shen* & C. Ding*

TL;DR: This paper proposes a new model of reference affinity that considers the distance between data accesses in addition to the frequency, and presents a statistical clustering method that identifies affinity groups among structure fields and data arrays by analyzing training runs of a program. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 10, 2020

2004 conference paper

Adaptive data partition for sorting using probability distribution

International Conference on Parallel Processing, 2004. ICPP 2004. Presented at the International Conference on Parallel Processing, 2004. ICPP 2004.

By: X. Shen* & C. Ding*

Event: International Conference on Parallel Processing, 2004. ICPP 2004.

TL;DR: A new partition method in sorting scenario based on probability distribution is presented, an idea first studied by Janus and Lamagna in early 1980's on a mainframe computer and an efficient implementation on modern, cache-based machines is presented. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 5, 2020

2004 conference paper

Array regrouping and structure splitting using whole-program reference affinity

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation - PLDI '04, 255.

By: Y. Zhong*, M. Orlovich*, X. Shen* & C. Ding*

Event: the ACM SIGPLAN 2004 conference

TL;DR: A model of reference affinity is defined, which measures how close a group of data are accessed together in a reference trace, and it is proved that the model gives a hierarchical partition of program data. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 5, 2020

2004 report

Characterizing Phases in Service-Oriented Applications

(Technical Report No. TR848). Computer Science Dept., University of Rochester.

By: X. Shen, C. Ding, S. Dwarkdas & M. Scott

Source: NC State University Libraries
Added: January 30, 2021

2004 journal article

Learning multi-label scene classification

Pattern Recognition, 37(9), 1757–1771.

By: M. Boutell*, J. Luo*, X. Shen* & C. Brown*

author keywords: image understanding; semantic scene classification; multi-label classification; multi-label training; multi-label evaluation; image organization; cross-training; Jaccard similarity
TL;DR: A framework to handle semantic scene classification, where a natural scene may contain multiple objects such that the scene can be described by multiple class labels, is presented and appears to generalize to other classification problems of the same nature. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 6, 2020

2004 conference paper

Locality phase prediction

Proceedings of the 11th international conference on Architectural support for programming languages and operating systems - ASPLOS-XI. Presented at the the 11th international conference.

By: X. Shen*, Y. Zhong* & C. Ding*

Event: the 11th international conference

TL;DR: Compared with existing methods based on program code and execution intervals, locality phase prediction is unique because it uses locality profiles, and it marks phase boundaries in program code. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: September 5, 2020

2004 conference paper

Multi-label Machine Learning and Its Application to Semantic Scene Classification

Proceedings of Storage and Retrieval Methods and Applications for Multimedia 2004, 5307, 188–199.

By: X. Shen*, M. Boutell*, J. Luo* & C. Brown*

Event: IS&T/SPIE’s Sixteenth Annual Symposium on Electronic Imaging at San Jose, CA

TL;DR: A framework to handle semantic scene classification, where a natural scene may contain multiple objects such that the scene can be described by multiple class labels, is presented and appears to generalize to other classification problems of the same nature. (via Semantic Scholar)
Sources: NC State University Libraries, ORCID
Added: February 6, 2021

2003 report

Adaptive Data Partitioning using Probability Distribution

(Technical Report No. TR823). Computer Science Dept., University of Rochester.

By: X. Shen, Y. Zhong & C. Ding

Source: NC State University Libraries
Added: January 30, 2021

2003 report

Multi-label Semantic Scene Classification

(Technical Report No. TR813). Dept. of Computer Science, University of Rochester.

By: M. Boutell, X. Shen, J. Luo & C. Brown

Source: NC State University Libraries
Added: January 30, 2021

2003 report

Predicting Hierarchical Phases in Program Data Behavior

(Technical Report No. TR824). Computer Science Dept., University of Rochester.

By: X. Shen, Y. Zhong & C. Ding

Source: NC State University Libraries
Added: January 30, 2021

2003 conference paper

Regression-Based Multi-Model Prediction of Data Reuse Signature

Proceedings of the Fourth Annual Symposium of the Los Alamos Computer Science Institute, 243–251. Sante Fe, New Mexico, USA: Alamos Computer Science Institute.

By: X. Shen, Y. Zhong & C. Ding

Event: Symposium of the Los Alamos Computer Science Institute at Santa Fe, NM

Source: NC State University Libraries
Added: February 6, 2021

2002 report

The Medication Advisor Project: Preliminary Report

(Technical Report No. 776). Dept. of Computer Science, University of Rochester.

By: G. Ferguson, J. Allen, N. Blaylock, D. Byron, N. Chambers, M. Dzikovska, L. Galescu, X. Shen, R. Swier, M. Swift

Source: NC State University Libraries
Added: January 30, 2021

2001 conference paper

Study and Auto-Detection of Stress Based on Tonal Pitch Range in Mandarin

Proceedings of Seventh European Conference on Speech Communication and Technology, 123–126. Aalborg, Denmark.

By: X. Shen & B. Xu

Event: Conference on Speech Communication and Technology at Aalborg, Denmark

Source: NC State University Libraries
Added: February 6, 2021

2001 conference paper

The Study Of The Effect Of Training Set On Statistical Language Modeling

Proceedings of Seventh European Conference on Speech Communication and Technology, 721–724. Aalborg, Denmark.

By: X. Shen & B. Xu

Event: Conference on Speech Communication and Technology at Aalborg, Denmark

Source: NC State University Libraries
Added: February 6, 2021

2000 conference paper

A CART-Based Hierarchical Stochastic Model for Prosodic Phrasing in Chinese

Proceedings of International Symposium on Chinese Spoken Language Processing 2000, 105–108. Beijing, China.

By: X. Shen & B. Xu

Event: International Symposium on Chinese Spoken Language Processing at Beijing, China

Source: NC State University Libraries
Added: February 6, 2021

Citation Index includes data from a number of different sources. If you have questions about the sources of data in the Citation Index or need a set of data which is free to re-distribute, please contact us.

Certain data included herein are derived from the Web of Science© and InCites© (2024) of Clarivate Analytics. All rights reserved. You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.