Works (35)

Updated: April 5th, 2024 10:48

2023 article

A Code Transformation to Improve the Efficiency of OpenCL Code on FPGA through Pipes

PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2023, CF 2023, pp. 101–111.

By: M. Zarch n & M. Becchi n

author keywords: OpenCL; FPGA; high-level synthesis; compiler techniques; pipes; performance optimization
TL;DR: A code transformation is proposed to improve the performance of OpenCL codes running on FPGA by using pipes to separate the memory accesses and core computation within OpenCL kernels, and can result in higher utilization of the global memory bandwidth available and increased instruction concurrency. (via Semantic Scholar)
Source: Web Of Science
Added: March 4, 2024

2023 article

Evaluating Asynchronous Parallel I/O on HPC Systems

2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS, pp. 211–221.

By: J. Ravi n, S. Byna*, Q. Koziol*, H. Tang* & M. Becchi n

author keywords: Performance Evaluation; Modeling; Asynchronous; I/O; Parallel I/O
TL;DR: A systematic study of various factors affecting the performance and efficacy of asynchronous I/O in HPC systems is performed, a performance model is developed to estimate the aggregateI/O bandwidth achievable by iterative applications using synchronous and asynchronous I-O based on past observations, and the performance of the recently developed asynchronous I /O feature of a parallel I/o library (HDF5) is evaluated using benchmarks and real-world science applications. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Source: Web Of Science
Added: September 5, 2023

2023 article

GPU-Accelerated Error-Bounded Compression Framework for Quantum Circuit Simulations

2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS, pp. 757–767.

By: M. Shah n, X. Yu*, S. Di*, D. Lykov*, Y. Alexeev*, M. Becchi n, F. Cappello*

author keywords: compression; quantum computing; GPU
UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Source: Web Of Science
Added: September 5, 2023

2023 article

High-Level Synthesis of Irregular Applications: A Case Study on Influence Maximization

PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2023, CF 2023, pp. 12–22.

By: R. Neff n, M. Minutoli*, A. Tumeo* & M. Becchi n

author keywords: High-Level Synthesis; Graph Algorithms; Influence Maximization
TL;DR: This work analyzes the challenges and benefits of using a commercial state-of-the-art HLS tool and its available optimizations to accelerate graph sampling and discusses future opportunities for improvement in hardware, HLS tools, and hardware/software co-design methodology to better support complex irregular applications such as IMM. (via Semantic Scholar)
Source: Web Of Science
Added: March 4, 2024

2023 article

Lightweight Huffman Coding for Efficient GPU Compression

PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2023, pp. 99–110.

By: M. Shah n, X. Yu*, S. Di*, M. Becchi n & F. Cappello*

author keywords: compression; Huffman coding; GPU
TL;DR: This paper designs a scheme to improve the performance of cuSZ, a GPU-based lossy compressor, and creates a dictionary of pre-computed codebooks such that during compression, a codebook is selected from the dictionary instead of computing a custom codebook. (via Semantic Scholar)
Source: Web Of Science
Added: January 29, 2024

2023 article

Runway: In-transit Data Compression on Heterogeneous HPC Systems

2023 IEEE/ACM 23RD INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING, CCGRID, pp. 229–239.

By: J. Ravi n, S. Byna* & M. Becchi n

author keywords: Object Data Management; In-transit Computation; Heterogeneous Resources
TL;DR: This paper introduces Runway, a runtime framework that enables computation on in-transit data with an object storage abstraction that is designed to be extensible to execute user-defined functions at runtime. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Source: Web Of Science
Added: August 21, 2023

2023 article

Runway: In-transit Data Compression on Heterogeneous HPC Systems

2023 IEEE/ACM 23RD INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING WORKSHOPS, CCGRIDW, pp. 340–342.

By: J. Ravi n, S. Byna* & M. Becchi n

author keywords: Object Data Management; In-transit Computation; Heterogeneous Resources
UN Sustainable Development Goal Categories
Source: Web Of Science
Added: September 18, 2023

2022 article

A GPU-accelerated Data Transformation Framework Rooted in Pushdown Transducers

2022 IEEE 29TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC, pp. 215–225.

By: T. Nguyen n & M. Becchi n

author keywords: Finite state transducers; Pushdown transducers; Data transformation; GPU acceleration
TL;DR: This work defines an extended pushdown transducer abstraction (effPDT) that allows expressing a wide range of data transformations in a memory-efficient fashion, and is thus amenable for GPU deployment, and extends it to also support finite state transducers (FSTs). (via Semantic Scholar)
Source: Web Of Science
Added: June 12, 2023

2022 article

Accelerating Random Forest Classification on GPU and FPGA

51ST INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2022.

By: M. Shah n, R. Neff n, H. Wu n, M. Minutoli*, A. Tumeo* & M. Becchi n

author keywords: random forest classification; GPU; FPGA
TL;DR: This work proposes a hierarchical memory layout suitable to the GPU/FPGA memory hierarchy, and designs three RF classification code variants based on that layout, and investigates GPU- and FPGA-specific considerations for these kernels. (via Semantic Scholar)
UN Sustainable Development Goal Categories
1. No Poverty (Web of Science)
Source: Web Of Science
Added: October 30, 2023

2021 article

Exploring Thread Coarsening on FPGA

2021 IEEE 28TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2021), pp. 436–441.

By: M. Zarch n, R. Neff n & M. Becchi n

author keywords: OpenCL; FPGA; high-level synthesis; compiler techniques; thread-coarsening; performance optimization
Source: Web Of Science
Added: May 2, 2022

2021 article

PILOT: a Runtime System to Manage Multi-tenant GPU Unified Memory Footprint

2021 IEEE 28TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2021), pp. 442–447.

By: J. Ravi n, T. Nguyen n, H. Zhou n & M. Becchi n

TL;DR: A proposed three methods to transparently mitigate memory interference through kernel preemption and scheduling policies are proposed, which would enable new OS-managed scheduling policies to be implemented for GPU kernels to dynamically handle resource contention and offer consistent performance. (via Semantic Scholar)
Sources: Web Of Science, NC State University Libraries
Added: May 2, 2022

2020 article

A Loop-aware Autotuner for High-Precision Floating-point Applications

2020 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS), pp. 285–295.

By: R. Gu n, P. Beata n & M. Becchi n

author keywords: autotuner; mixed-precision; floating-point
TL;DR: This work proposes an auto-tuner for applications requiring high-precision floating-point arithmetic to deliver a prescribed level of accuracy, and generates a mixed precision program that trades off performance and accuracy by selectively using different precisions for different variables and operations. (via Semantic Scholar)
Source: Web Of Science
Added: May 24, 2021

2020 article

Evaluating Thread Coarsening and Low-cost Synchronization on Intel Xeon Phi

2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, pp. 1018–1029.

By: H. Wu n & M. Becchi n

author keywords: SIMT; manycore processors; Intel Xeon Phi; thread coarsening; synchronization
TL;DR: This work explores thread coarsening as a way to remap the work to the available cores and vector lanes, and proposes low- overhead synchronization primitives, such as atomic operations and barriers, which transparently apply to threads mapped to the same or different VPUs and x86 cores. (via Semantic Scholar)
Source: Web Of Science
Added: June 10, 2021

2020 article

GPU-Based Static Data-Flow Analysis for Fast and Scalable Android App Vetting

2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, pp. 274–284.

By: X. Yu*, F. Wei*, X. Ou*, M. Becchi n, T. Bicer* & D. Yao*

author keywords: GPU; static program analysis; data-flow analysis; Android security; mobile application vetting; worklist algorithm; application-specific optimization
TL;DR: This paper proposes GDroid, a GPU-based worklist algorithm implementation with multiple fine-grained optimizations tailored to common characteristics of Android applications, and shows that the proposed optimizations are beneficial to performance and GDroid can achieve up to 128X speedups against a plain GPU implementation. (via Semantic Scholar)
Source: Web Of Science
Added: June 10, 2021

2020 article

GPU-FPtuner: Mixed-precision Auto-tuning for Floating-point Applications on GPU

2020 IEEE 27TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2020), pp. 294–304.

By: R. Gu n & M. Becchi n

author keywords: GPU; floating-point arithmetic; mixed-precision arithmetic; accuracy; performance; autotuning
TL;DR: A mixed precision autotuner for GPU applications that rely on floating-point arithmetic that takes into account code patterns prone to error propagation and GPU-specific considerations to generate a tuning plan that balances performance and accuracy. (via Semantic Scholar)
UN Sustainable Development Goal Categories
13. Climate Action (OpenAlex)
Source: Web Of Science
Added: August 2, 2021

2020 article

Optimizing Complex OpenCL Code for FPGA: A Case Study on Finite Automata Traversal

2020 IEEE 26TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), pp. 518–527.

By: M. Nourian n, M. Zarch n & M. Becchi n

author keywords: OpenCL; FPGA; high-level synthesis; automata processing; NFA; performance optimization
TL;DR: This paper considers an OpenCL NFA traversal kernel optimized for GPU but exhibiting FPGA-friendly characteristics, namely: limited memory requirements, lack of synchronization, and SIMD execution, and explores a set of structural code changes, custom and best-practice optimizations to retarget this code to FPGAs. (via Semantic Scholar)
Source: Web Of Science
Added: July 19, 2021

2019 article

A Comparative Study of Parallel Programming Frameworks for Distributed GPU Applications

CF '19 - PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS, pp. 268–273.

By: R. Gu n & M. Becchi n

author keywords: Parallel computing; Distributed applications; Homogeneous cluster
TL;DR: This work considers several popular parallel programming frameworks for distributed applications and analyzes their memory model, execution model, synchronization model and GPU support, and compares their programmability, performance, scalability, and load-balancing capability on homogeneous computing cluster equipped with GPUs. (via Semantic Scholar)
Source: Web Of Science
Added: July 29, 2019

2019 article

Editorial: Special Issue on Computing Frontiers

Palumbo, F., & Becchi, M. (2019, March). JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, Vol. 91, pp. 273–273.

By: F. Palumbo* & M. Becchi n

Source: Web Of Science
Added: March 18, 2019

2019 journal article

Evaluating High Performance Pattern Matching on the Automata Processor

IEEE TRANSACTIONS ON COMPUTERS, 68(8), 1201–1212.

By: I. Roy, A. Srivastava*, M. Grimm*, M. Nourian n, M. Becchi n & S. Aluru*

author keywords: Finite automata; regular expressions; automata processor; FPGAs; intrusion detection; protein motifs
TL;DR: The acceleration of applications that identify all the occurrences of thousands of string-patterns in an input data-stream using the Automata Processor is studied, finding that the performance derived by using the resources of a single AP-board, which houses 32 AP-chips, is comparable to that of the Resources of five to six large FPGAs. (via Semantic Scholar)
Source: Web Of Science
Added: July 29, 2019

2018 article

A Compiler Framework for Fixed-topology Non-deterministic Finite Automata on SIMD Platforms

2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018), pp. 507–516.

By: M. Nourian, H. Wu & M. Becchi

author keywords: Automata Processing; NFAs; SIMD; GPUs; Intel Xeon Phi platforms
Source: Web Of Science
Added: April 22, 2019

2018 article

Compiling SIMT Programs on Multi- and Many-core Processors with Wide Vector Units: A Case Study with CUDA

2018 IEEE 25TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), pp. 123–132.

By: H. Wu n, J. Ravi n & M. Becchi n

author keywords: Xeon Phi; hybrid MIMD/SIMD systems; CUDA; SIMT; vectorization
TL;DR: A set of compiler techniques are proposed to transform programs written using a SIMT programming model (a subset of CUDA C) into code that leverages both the x86 cores and the vector units of a hybrid MIMD/SIMD architecture, thus providing programmability, high system utilization and performance. (via Semantic Scholar)
Source: Web Of Science
Added: June 17, 2019

2017 journal article

A Principled Approach to Secure Multi-core Processor Design with ReWire

ACM Transactions on Embedded Computing Systems, 16(2), 1–25.

By: A. Procter*, W. Harrison*, I. Graves*, M. Becchi* & G. Allwein*

author keywords: Equational reasoning; monads; hardware security; reconfigurable computing
TL;DR: This case study comprises the development of secure single- and dual-core variants of a single processor based on a common semantic specification of the ISA, and demonstrates both ReWire’s expressiveness as a programming language and its power as a framework for formal, high-level reasoning about hardware systems. (via Semantic Scholar)
Source: Crossref
Added: July 20, 2019

2017 article

An Analytical Study of Recursive Tree Traversal Patterns on Multi- and Many-core Platforms

2017 IEEE 23RD INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), pp. 586–595.

By: H. Wu n & M. Becchi n

author keywords: recursive tree traversal; many-core processors; parallelism; GPU
TL;DR: The analysis shows that there is not a single code variant and platform that achieves the best performance on all tree traversal patterns, and it provides guidelines on the selection of the implementation most suited to a given tree traverse pattern and input dataset. (via Semantic Scholar)
Source: Web Of Science
Added: November 19, 2018

2017 conference paper

Understanding the performance-accuracy tradeoffs of floating-point arithmetic on GPUs

Proceedings of the 2017 ieee international symposium on workload characterization (iiswc), 207–218.

By: S. Surineni*, R. Gu n, H. Nguyen* & M. Becchi n

TL;DR: Analysis of the use of different floating-point precisions on GPU using a variety of synthetic and real-world benchmark applications provides insights to guide users to the selection of the arithmetic precision leading to a good performance/accuracy tradeoff depending on the arithmetic operations and mathematical functions used in their program and the degree of multithreading of the code. (via Semantic Scholar)
Source: NC State University Libraries
Added: August 6, 2018

2016 journal article

Picking Pesky Parameters: Optimizing Regular Expression Matching in Practice

IEEE Transactions on Parallel and Distributed Systems, 27(5), 1430–1442.

By: X. Chen*, B. Jones*, M. Becchi* & T. Wolf*

author keywords: Network security; deep packet inspection; deterministic finite automaton; non-deterministic finite automaton; regular expressions; design space exploration
TL;DR: This work explores the performance, area requirements, and power consumption of implementations targeting multi-core processors and FPGAs using rule sets of practical size and complexity and presents specific guidelines for determining optimal configurations based on a simple evaluation of the rule set. (via Semantic Scholar)
UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Source: Crossref
Added: July 20, 2019

2015 chapter

Hardware Synthesis from Functional Embedded Domain-Specific Languages: A Case Study in Regular Expression Compilation

In Lecture Notes in Computer Science (pp. 41–52).

By: I. Graves*, A. Procter*, W. Harrison*, M. Becchi* & G. Allwein*

TL;DR: A general methodology based on domain specific languages embedded in the functional language Haskell is proposed to bridge the gap between high level abstractions that support programmer productivity and the need for high performance in FPGA circuit implementations. (via Semantic Scholar)
Source: Crossref
Added: February 24, 2020

2014 journal article

Large-Scale Pairwise Alignments on GPU Clusters: Exploring the Implementation Space

Journal of Signal Processing Systems, 77(1-2), 131–149.

By: H. Truong*, D. Li*, K. Sajjapongse*, G. Conant* & M. Becchi*

Contributors: H. Truong*, D. Li*, K. Sajjapongse*, G. Conant* & M. Becchi*

author keywords: Heterogeneous system; Sequence alignment; GPU
TL;DR: This work presents four GPU implementations for large-scale pairwise sequence alignment, and suggests that LazyRScan-mNW is the preferred solution for applications that require performing the trace-back operation only on a subset of the considered sequence pairs. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: July 20, 2019

2014 journal article

Revisiting State Blow-Up: Automatically Building Augmented-FA While Preserving Functional Equivalence

IEEE Journal on Selected Areas in Communications, 32(10), 1822–1833.

By: X. Yu*, B. Lin* & M. Becchi*

author keywords: Deep packet inspection; finite automata; regular expression matching
TL;DR: This work proposes JFA, a finite automation that uses state variables to avoid state explosion, and is functionally equivalent to the corresponding DFA, and provides optimization techniques to both limit the amount of state variables required and provide a lower bound for the JFA traversal time. (via Semantic Scholar)
Source: Crossref
Added: July 20, 2019

2013 journal article

A-DFA

ACM Transactions on Architecture and Code Optimization, 10(1), 1–26.

By: M. Becchi* & P. Crowley*

author keywords: Algorithms; Design; Performance; Security; Deep packet inspection; regular expressions; deterministic finite automata; memory compression
TL;DR: Amortized time’s DFAs is introduced, a general compression technique that results in at most N(k + 1)/k state traversals when processing a string of length N, k being a positive integer, and achieves comparable levels of compression with lower provable bounds on memory bandwidth. (via Semantic Scholar)
Source: Crossref
Added: July 20, 2019

2013 journal article

Diet Alters Both the Structure and Taxonomy of the Ovine Gut Microbial Ecosystem

DNA Research, 21(2), 115–125.

By: M. Ellison*, G. Conant*, R. Cockrum*, K. Austin*, H. Truong*, M. Becchi*, W. Lamberson*, K. Cammack*

Contributors: M. Ellison*, G. Conant*, R. Cockrum*, K. Austin*, H. Truong*, M. Becchi*, W. Lamberson*, K. Cammack*

author keywords: Ovis aries; microbiome; 16S subunit
MeSH headings : Animals; Bacteria / classification; Bacteria / genetics; DNA, Ribosomal / genetics; Diet; Ecosystem; Metagenome; Rumen / microbiology; Sequence Analysis, DNA; Sheep / microbiology
TL;DR: Differences in taxonomic distributions appear to be grounded in an underlying common input of new microbial individuals into the rumen environment, with common organisms from one feed group being present in the other, but at much lower abundance. (via Semantic Scholar)
Sources: Crossref, ORCID
Added: July 20, 2019

2013 chapter

Efficient GPU Implementation of the Integral Histogram

In Computer Vision - ACCV 2012 Workshops (pp. 266–278).

By: M. Poostchi*, K. Palaniappan*, F. Bunyak*, M. Becchi* & G. Seetharaman*

TL;DR: A proposed Graphics Processing Unit (GPU) implementation uses parallel prefix sums on row and column histograms in a cross-weave scan with high GPU utilization and communication-aware data transfer between CPU and GPU memories. (via Semantic Scholar)
UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Source: Crossref
Added: August 28, 2020

2013 journal article

Scheduling concurrent applications on a cluster of CPU–GPU nodes

Future Generation Computer Systems, 29(8), 2262–2271.

By: V. Ravi*, M. Becchi*, W. Jiang*, G. Agrawal* & S. Chakradhar

author keywords: Scheduling; CPU-GPU systems
Source: Crossref
Added: July 20, 2019

2012 journal article

A Massively Parallel, Energy Efficient Programmable Accelerator for Learning and Classification

ACM Transactions on Architecture and Code Optimization, 9(1), 1–30.

By: A. Majumdar, S. Cadambi, M. Becchi*, S. Chakradhar & H. Graf

author keywords: Design; Performance; Accelerator-based computing; parallel computing; heterogeneous computing; machine learning; architecture
TL;DR: The MAPLE architecture is described, its design space is explored with a simulator, how to automatically map application kernels to the hardware is illustrated, and its performance improvement and energy benefits over classic server-based implementations are presented. (via Semantic Scholar)
UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Source: Crossref
Added: July 20, 2019

2012 journal article

Accelerating large-scale protein structure alignments with graphics processing units

BMC Research Notes, 5(1), 116.

MeSH headings : Algorithms; Computational Biology / methods; Computer Graphics; Databases, Protein; Proteins / chemistry; Sequence Alignment; Software; Structural Homology, Protein
Source: Crossref
Added: July 20, 2019

2008 journal article

Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures

The Journal of Instruction-Level Parallelism (JILP), 10.

By: M. Becchi & P. Crowley

Source: NC State University Libraries
Added: July 28, 2019

Citation Index includes data from a number of different sources. If you have questions about the sources of data in the Citation Index or need a set of data which is free to re-distribute, please contact us.

Certain data included herein are derived from the Web of Science© and InCites© (2024) of Clarivate Analytics. All rights reserved. You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.