Is this your profile?
Claim your Citation Index profile in order to display more information about you and gain access to Libraries services. Just create or connect your ORCID iD.
2021 article
Exploring Thread Coarsening on FPGA
2021 IEEE 28TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2021), pp. 436–441.
2021 article
PILOT: a Runtime System to Manage Multi-tenant GPU Unified Memory Footprint
2021 IEEE 28TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2021), pp. 442–447.
2020 article
A Loop-aware Autotuner for High-Precision Floating-point Applications
2020 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS), pp. 285–295.
2020 article
Evaluating Thread Coarsening and Low-cost Synchronization on Intel Xeon Phi
2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, pp. 1018–1029.
2020 article
GPU-Based Static Data-Flow Analysis for Fast and Scalable Android App Vetting
2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, pp. 274–284.
2020 article
GPU-FPtuner: Mixed-precision Auto-tuning for Floating-point Applications on GPU
2020 IEEE 27TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2020), pp. 294–304.
2020 article
Optimizing Complex OpenCL Code for FPGA: A Case Study on Finite Automata Traversal
2020 IEEE 26TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), pp. 518–527.
2019 article
A Comparative Study of Parallel Programming Frameworks for Distributed GPU Applications
CF '19 - PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS, pp. 268–273.
2019 article
Editorial: Special Issue on Computing Frontiers
Palumbo, F., & Becchi, M. (2019, March). JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, Vol. 91, pp. 273–273.
2019 journal article
Evaluating High Performance Pattern Matching on the Automata Processor
IEEE TRANSACTIONS ON COMPUTERS, 68(8), 1201–1212.
2018 article
A Compiler Framework for Fixed-topology Non-deterministic Finite Automata on SIMD Platforms
2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018), pp. 507–516.
2018 article
Compiling SIMT Programs on Multi- and Many-core Processors with Wide Vector Units: A Case Study with CUDA
2018 IEEE 25TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), pp. 123–132.
2017 journal article
A Principled Approach to Secure Multi-core Processor Design with ReWire
ACM Transactions on Embedded Computing Systems, 16(2), 1–25.
2017 article
An Analytical Study of Recursive Tree Traversal Patterns on Multi- and Many-core Platforms
2017 IEEE 23RD INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), pp. 586–595.
2017 conference paper
Understanding the performance-accuracy tradeoffs of floating-point arithmetic on GPUs
Proceedings of the 2017 ieee international symposium on workload characterization (iiswc), 207–218.
2016 journal article
Picking Pesky Parameters: Optimizing Regular Expression Matching in Practice
IEEE Transactions on Parallel and Distributed Systems, 27(5), 1430–1442.
2015 chapter
Hardware Synthesis from Functional Embedded Domain-Specific Languages: A Case Study in Regular Expression Compilation
In Lecture Notes in Computer Science (pp. 41–52).
2014 journal article
Large-Scale Pairwise Alignments on GPU Clusters: Exploring the Implementation Space
Journal of Signal Processing Systems, 77(1-2), 131–149.
2014 journal article
Revisiting State Blow-Up: Automatically Building Augmented-FA While Preserving Functional Equivalence
IEEE Journal on Selected Areas in Communications, 32(10), 1822–1833.
2013 journal article
A-DFA
ACM Transactions on Architecture and Code Optimization, 10(1), 1–26.
2013 journal article
Diet Alters Both the Structure and Taxonomy of the Ovine Gut Microbial Ecosystem
DNA Research, 21(2), 115–125.
2013 chapter
Efficient GPU Implementation of the Integral Histogram
In Computer Vision - ACCV 2012 Workshops (pp. 266–278).
2013 journal article
Scheduling concurrent applications on a cluster of CPU–GPU nodes
Future Generation Computer Systems, 29(8), 2262–2271.
2012 journal article
A Massively Parallel, Energy Efficient Programmable Accelerator for Learning and Classification
ACM Transactions on Architecture and Code Optimization, 9(1), 1–30.
2012 journal article
Accelerating large-scale protein structure alignments with graphics processing units
BMC Research Notes, 5(1), 116.
2008 journal article
Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures
The Journal of Instruction-Level Parallelism (JILP), 10.