Frank Mueller

Works (118)

Updated: April 5th, 2024 02:08

2022 article

CLAIRE: Enabling Continual Learning for Real-time Autonomous Driving with a Dual-head Architecture

2022 IEEE 25TH INTERNATIONAL SYMPOSIUM ON REAL-TIME DISTRIBUTED COMPUTING (ISORC 2022), pp. 51–60.

By: H. Zhang n & F. Mueller n

author keywords: Autonomous Systems; On-board Continual Learning; Real-Time Deep Learning Inference
TL;DR: A novel lightweight dual-head detection network architecture is proposed to overcome forgetting and to support fast on-board continual learning on small sets of new images and assesses the feasibility of continual learning methods for autonomous driving. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Source: Web Of Science
Added: October 17, 2022

2022 article

Combining Hard and Soft Constraints in Quantum Constraint-Satisfaction Systems

SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS.

By: E. Wilson n, F. Mueller n & S. Pakin*

author keywords: circuit-model quantum computing; quantum annealing; programming models
TL;DR: This enhanced version of NchooseK enables problems to be expressed in a more concise, less error-prone manner than if these problems were encoded manually for quantum execution, and includes an empirical evaluation of performance, scalability, and fidelity on both a large IBM Q system and a large D- Wave system. (via Semantic Scholar)
Source: Web Of Science
Added: June 12, 2023

2022 article

Guest editorial: Special issue on the 2020 IEEE symposium on real-time distributed computing (ISORC)

Cucinotta, T., Mueller, F., & Simmhan, Y. (2022, March). JOURNAL OF SYSTEMS ARCHITECTURE, Vol. 124.

By: T. Cucinotta*, F. Mueller n & Y. Simmhan*

UN Sustainable Development Goal Categories
Source: Web Of Science
Added: May 2, 2022

2022 article

P-ckpt: Coordinated Prioritized Checkpointing

2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022), pp. 436–446.

By: S. Behera n, L. Wan*, F. Mueller n, M. Wolf* & S. Klasky*

author keywords: Fault Tolerance; High-Performance Computing; Failure Prediction; I/O subsystem; Checkpoint/Restart; Live Migration; Burst Buffers
TL;DR: A novel checkpointing technique that aims to maintain the performance efficiency of failure-aware C/R models even when failures are predicted with a small lead time, and creates the hybrid p-ckpt model by integrating Live Migration because of its cost-effectiveness and to reduce checkpoint frequency. (via Semantic Scholar)
Source: Web Of Science
Added: September 29, 2022

2022 article

T-SYS: Timed-Based System Security for Real-Time Kernels

2022 13TH ACM/IEEE INTERNATIONAL CONFERENCE ON CYBER-PHYSICAL SYSTEMS (ICCPS 2022), pp. 247–258.

By: B. McDonald n & F. Mueller n

author keywords: Real-time systems; security; worst-case execution time
TL;DR: T-SYS, a timed-system method of detecting intrusions into real-time systems via timing anomalies, is contributed and its effectiveness in terms of detecting attacks as they intrude a system is assessed. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Source: Web Of Science
Added: September 19, 2022

2021 journal article

Hummingbird: efficient performance prediction for executing genomic applications in the cloud

BIOINFORMATICS, 37(17), 2537–2543.

By: A. Bahmani*, Z. Xing*, V. Krishnan*, U. Ray n, F. Mueller n, A. Alavi*, P. Tsao, M. Snyder*, C. Pan

TL;DR: Hummerbird is introduced, a tool for predicting performance of computing instances with varying memory and CPU on multiple cloud platforms and accurately predicted the fastest, the cheapest, and the most cost-efficient compute instances in an economic manner. (via Semantic Scholar)
Source: Web Of Science
Added: October 4, 2021

2021 article

Mapping Constraint Problems onto Quantum Gate and Annealing Devices

PROCEEDINGS OF SECOND INTERNATIONAL WORKSHOP ON QUANTUM COMPUTING SOFTWARE (QCS 2021), pp. 110–117.

By: E. Wilson n, F. Mueller n & S. Pakin*

author keywords: circuit-model quantum computing; quantum annealing; programming models
TL;DR: This work presents NchooseK, a unified programming model for constraint satisfaction problems that can be mapped to both quantum circuit and annealing devices through Quadratic Unconstrained Binary Operators (QUBOs). (via Semantic Scholar)
Source: Web Of Science
Added: February 28, 2022

2021 journal article

NUMA-aware memory coloring for multicore real-time systems

JOURNAL OF SYSTEMS ARCHITECTURE, 118.

By: X. Pan n & F. Mueller n

author keywords: Memory access; NUMA; Real-time predictability
TL;DR: This work contributes a controller/node-aware memory coloring (CAMC) allocator inside the Linux kernel for the entire address space to reduce access conflicts and latencies by isolating tasks from one another. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Source: Web Of Science
Added: September 7, 2021

2021 article

Quantum Annealing Stencils with Applications to Fuel Loading of a Nuclear Reactor

2021 IEEE INTERNATIONAL CONFERENCE ON QUANTUM COMPUTING AND ENGINEERING (QCE 2021) / QUANTUM WEEK 2021, pp. 265–275.

By: J. Fustero n, S. Palmtag n & F. Mueller n

author keywords: quantum annealing; noisy intermediate-scale quantum computing; topology graph embeddings
TL;DR: Applying the technique to the problem of determining an effective fuel loading pattern for nuclear reactors shows that densely mapped quantum stencils result in higher fidelity solutions of optimization problems then the sparser default solutions. (via Semantic Scholar)
Sources: Web Of Science, NC State University Libraries
Added: February 7, 2022

2021 article

Systemic Assessment of Node Failures in HPC Production Platforms

2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), pp. 267–276.

By: A. Das n, F. Mueller n & B. Rountree*

author keywords: Root Cause; Node Failures; Holistic Analysis
TL;DR: It is shown that external environmental influence is not strongly correlated with node failures in terms of the root cause, and lead time enhancements are feasible for nodes showing fail slow characteristics. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Source: Web Of Science
Added: October 4, 2021

2021 article

T-Pack: Timed Network Security for Real Time Systems

2021 IEEE 24TH INTERNATIONAL SYMPOSIUM ON REAL-TIME DISTRIBUTED COMPUTING (ISORC 2021), pp. 20–28.

By: S. Mittal n & F. Mueller n

TL;DR: This work proposes to detect intrusion based on time dilation induced by time delays within the network potentially resulting in system malfunctioning due to missed deadlines, and introduces a new method of timed packet protection, T-Pack, which analyzes end-to-end transmission times of packets and detects a compromised system or network based on deviation of observed time from the expected time on end nodes. (via Semantic Scholar)
UN Sustainable Development Goal Categories
16. Peace, Justice and Strong Institutions (OpenAlex)
Source: Web Of Science
Added: September 26, 2022

2020 article

Aarohi: Making Real-Time Node Failure Prediction Feasible

2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, pp. 1092–1101.

By: A. Das n, F. Mueller n & B. Rountree*

author keywords: Online Prediction; HPC; Node Failures; Parsing
TL;DR: This work tackles online anomaly prediction in computing systems by exploiting context free grammar-based rapid event analysis and presents the framework Aarohi, which describes an effective way to predict failures online. (via Semantic Scholar)
Source: Web Of Science
Added: June 10, 2021

2020 journal article

BarrierFinder: recognizing ad hoc barriers

EMPIRICAL SOFTWARE ENGINEERING, 25(6), 4676–4706.

By: T. Wang n, X. Yu n, Z. Qiu n, G. Jin n & F. Mueller n

author keywords: Ad hoc synchronizations; Barriers; Program slicing; Symbolic execution; Temporal invariants
Source: Web Of Science
Added: September 21, 2020

2020 article

Just-in-time Quantum Circuit Transpilation Reduces Noise

IEEE INTERNATIONAL CONFERENCE ON QUANTUM COMPUTING AND ENGINEERING (QCE20), pp. 345–355.

By: E. Wilson n, S. Singh n & F. Mueller n

author keywords: quantum computing; errors; dynamic compilation
TL;DR: Experiments indicate that the accuracy of circuit results improves by 3–304% on average and up to 400% with on-the-fly circuit mappings based on error measurements just prior to application execution, which is improved over IBM's default mappings. (via Semantic Scholar)
Source: Web Of Science
Added: June 10, 2021

2020 conference paper

Symbiotic HW Cache and SW DTLB Prefetching for DRAM/NVM Hybrid Memory

2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), 1–8.

By: O. Patil n, F. Mueller n, L. Ionkov*, J. Lee* & M. Lang*

Event: 2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) at Nice, France

TL;DR: It is hypothesized that HW and SW prefetching can complement each other in placing data in caches and the Data Translation Look-aside Buffer (DTLB) prior to their references, and by doing so adaptively, highly varying access latencies in a DRAM/NVM hybrid memory system are taken into account. (via Semantic Scholar)
UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Source: Crossref
Added: February 1, 2021

2020 article

VCFC: Structural and Semantic Compression and Indexing of Genetic Variant Data

2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, pp. 200–203.

By: K. Ferriter n, F. Mueller n, A. Bahmani* & C. Pan

TL;DR: The evaluation over genomic datasets indicates compression at a comparable size for the data representation while resulting in speedup of ˇ2X in indexed queries compared to the industry standard, underlines that the representation could replace existing standards resulting in reduced computational cost at equivalent storage size. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Source: Web Of Science
Added: August 16, 2021

2019 article

Automatically Translating Quantum Programs from a Subset of Common Gates to an Adiabatic Representation

REVERSIBLE COMPUTATION (RC 2019), Vol. 11497, pp. 146–161.

By: M. Regan n, B. Eastwood n, M. Nagabhiru n & F. Mueller n

author keywords: Quantum computation; Quantum annealing; Quantum gate circuits; Adiabatic computation
TL;DR: Adiabatic computing with two degrees of freedom of 2-local Hamiltonians has been theoretically shown to be equivalent to the gate model of universal quantum computing, but today's quantum annealers, namely D-Wave’s 2000Q platform, only provide a 2- local Ising Hamiltonian abstraction with a single degree of freedom. (via Semantic Scholar)
UN Sustainable Development Goal Categories
16. Peace, Justice and Strong Institutions (OpenAlex)
Source: Web Of Science
Added: November 25, 2019

2019 article

BARRIERFINDER: Recognizing Ad Hoc Barriers

2019 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2019), pp. 323–327.

By: T. Wang n, X. Yu n, Z. Qiu n, G. Jin n & F. Mueller n

author keywords: ad hoc synchronization; barrier; symbolic execution; interprocedural program slicing; Cloud9; LLVM
TL;DR: A framework to automatically identify complex ad hoc synchronizations in full and infer their synchronization relationships and a tool called BarrierFinder, which features various techniques, including program slicing and bounded symbolic execution, to efficiently explore the interleaving space of ad hoc synchronizations within multi-threaded programs and collect execution traces. (via Semantic Scholar)
Source: Web Of Science
Added: April 14, 2020

2019 article

End-to-End Resilience for HPC Applications

HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2019, Vol. 11501, pp. 271–290.

By: A. Rezaei n, H. Khetawat n, O. Patil n, F. Mueller n, P. Hargrove* & E. Roman*

author keywords: Resilience; Silent data corruption; Pragma programming
TL;DR: The live vulnerability factor (LVF) is introduced, a new metric that quantifies any lack of end-to-end protection for a given data structure that lifts the data protection burden from application programmers allowing them to focus solely on algorithms and performance while resilience is specified and subsequently embedded into the code through the compiler/library and supported by the runtime system. (via Semantic Scholar)
Source: Web Of Science
Added: November 18, 2019

2019 article

FuncyTuner: Auto-tuning Scientific Applications With Per-loop Compilation

PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019).

By: T. Wang n, N. Jain*, D. Beckingsale*, D. Boehme*, F. Mueller n & T. Gamblin*

author keywords: per-loop; fine-grained; auto-tuning; ICC; compiler; optimization; profile; OperalP; HPC; scientific simulation
TL;DR: It is demonstrated that a naïve greedy approach to per-region compilation often degrades performance in comparison to the 03 baseline, and a novel per-loop compilation framework, FuncyTuner, is contributed, which employs lightweight profiling to collect per- loop timing information, and then utilizes a space-focusing technique to construct a performant executable. (via Semantic Scholar)
Source: Web Of Science
Added: October 28, 2019

2019 article

Implementing NChooseK on IBM Q Quantum Computer Systems

REVERSIBLE COMPUTATION (RC 2019), Vol. 11497, pp. 209–223.

By: H. Khetawat n, A. Atrey n, G. Li n, F. Mueller n & S. Pakin*

author keywords: IBM Q; Quantum computing; NChooseK
TL;DR: This work implements a code generator that, given arbitrary parameters for N and K, generates code suitable for execution on IBM Q quantum hardware and assess the performance of the code generator, limitations in the size of circuit depth and number of gates, and proposed optimizations. (via Semantic Scholar)
Source: Web Of Science
Added: November 25, 2019

2019 conference paper

Performance characterization of a DRAM-NVM hybrid memory architecture for HPC applications using intel optane DC persistent memory modules

Proceedings of the International Symposium on Memory Systems - MEMSYS '19. Presented at the the International Symposium.

By: O. Patil n, L. Ionkov*, J. Lee*, F. Mueller n & M. Lang*

Event: the International Symposium

author keywords: NVM; Persistent Memory; Intel Optane DC; Memory Allocation; Hybrid Memory; NUMA; SICM
TL;DR: It is found that Optane-only executions are slower in terms of execution time than DRAM-only and Memory-mode executions by a minimum of 2 to 16% for VPIC and maximum of 6x for LULESH, which means HPC mini-apps can now scale up the their problem size given such a memory system. (via Semantic Scholar)
UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Source: Crossref
Added: March 2, 2020

2019 article

Programming Quantum Computers: A Primer with IBM Q and D-Wave Exercises

PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19), pp. 451–451.

By: F. Mueller n, G. Byrd n & P. Dreher n

author keywords: quantum computing
TL;DR: This tutorial provides a hands-on introduction to quantum computing and will feature the three pillars, architectures, programming, and algorithms/applications of quantum computing. (via Semantic Scholar)
Sources: Web Of Science, NC State University Libraries
Added: December 11, 2020

2019 article

The Colored Refresh Server for DRAM

2019 IEEE 22ND INTERNATIONAL SYMPOSIUM ON REAL-TIME DISTRIBUTED COMPUTING (ISORC 2019), pp. 27–34.

By: X. Pan n & F. Mueller n

TL;DR: This work contributes the “Colored Refresh Server” (CRS), a uniprocessor scheduling paradigm that partitions DRAM in two distinctly colored groups such that refreshes of one color occur in parallel to the execution of real-time tasks of the other color. (via Semantic Scholar)
Source: Web Of Science
Added: November 4, 2019

2019 article

The Colored Refresh Server for DRAM

2019 IEEE 40TH REAL-TIME SYSTEMS SYMPOSIUM (RTSS 2019), pp. 146–153.

By: X. Pan n & F. Mueller n

TL;DR: Experimental results confirm that refresh overhead is completely hidden and memory throughput enhanced, and the CRS, a uniprocessor scheduling paradigm that partitions DRAM in two distinctly colored groups such that refreshes of one color occur in parallel to the execution of real-time tasks of the other color, is contributed. (via Semantic Scholar)
Source: Web Of Science
Added: September 28, 2020

2019 article

Uncore Power Scavenger: A Runtime for Uncore Power Conservation on HPC Systems

PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS.

By: N. Gholkar n, F. Mueller n & B. Rountree*

author keywords: high-performance computing; power awareness
TL;DR: This work proposes Uncore Power Scavenger, a runtime system that dynamically detects phase changes and automatically sets the best uncore frequency for every phase to save power without significant impact on performance and achieves up to 20% speedup and proportional energy savings compared to Intel's RAPL with equivalent power usage. (via Semantic Scholar)
UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Source: Web Of Science
Added: July 20, 2020

2018 article

A Failure Recovery Protocol for Software-Defined Real-Time Networks

Qian, T., & Mueller, F. (2018, November). IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, Vol. 37, pp. 2222–2232.

By: T. Qian n & F. Mueller n

author keywords: Bounded network failure recovery; distributed real-time systems; software-defined networking
TL;DR: This paper develops a dynamic failure recovery policy and a protocol to address the second aspect of the network stability, and derives new real-time forwarding paths without compromising the capability of network devices to guarantee deadlines of concurrent real- time transmissions. (via Semantic Scholar)
UN Sustainable Development Goal Categories
9. Industry, Innovation and Infrastructure (OpenAlex)
Source: Web Of Science
Added: November 12, 2018

2018 article

Chameleon: Online Clustering of MPI Program Traces

2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), pp. 1102–1112.

By: A. Bahmani n & F. Mueller n

author keywords: High-Performance Computing; Message Passing; Tracing; Clustering Algorithms
TL;DR: This work considers parallel applications using the SPMD (single program multiple data) paradigm that relies on iterative kernels and contributes an online, fast, and scalable signature-based clustering algorithm called Chameleon, which combines low overhead at the clustering level a lower time complexity of log (P) than prior work. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Source: Web Of Science
Added: October 16, 2018

2018 article

CloneHadoop: Process Cloning to Reduce Hadoop's Long Tail

2018 IEEE/ACM 5TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING APPLICATIONS AND TECHNOLOGIES (BDCAT), pp. 11–20.

By: S. Kukreti n & F. Mueller n

author keywords: Mapreduce; Stragglers; Process Cloning
TL;DR: A novel speculation approach via process cloning to avoid redundant computations transparent to users and an integration of cloning and recovery into Apache Hadoop with optimizations to alleviate resource bottlenecks is promoted. (via Semantic Scholar)
Source: Web Of Science
Added: March 4, 2019

2018 article

Co-Scheduling on Fused CPU-GPU Architectures With Shared Last Level Caches

Damschen, M., Mueller, F., & Henkel, J. (2018, November). IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, Vol. 37, pp. 2337–2347.

By: M. Damschen*, F. Mueller n & J. Henkel*

author keywords: Heterogeneous computing; integrated architecture; performance tuning; scheduling
TL;DR: It is shown, however, that in most cases it is not beneficial to split the work of a kernel between CPU and GPU compared to exclusively running it on the most suitable single compute device. (via Semantic Scholar)
Source: Web Of Science
Added: November 12, 2018

2018 article

Controller-Aware Memory Coloring for Multicore Real-Time Systems

33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, pp. 584–592.

By: X. Pan n & F. Mueller n

author keywords: memory access; NUMA; real-time predictability
TL;DR: This work contributes a controller/node-aware memory coloring (CAMC) allocator inside the Linux kernel for the entire address space to reduce access conflicts and latencies by isolating tasks from one another. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Source: Web Of Science
Added: January 28, 2019

2018 article

Desh: Deep Learning for System Health Prediction of Lead Times to Failure in HPC

HPDC '18: PROCEEDINGS OF THE 27TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, pp. 40–51.

By: A. Das n, F. Mueller n, C. Siegel & A. Vishnu*

author keywords: LSTM; Failure Prediction; Log Mining; HPC; Node Failures; Lead Times; Anomaly Detection; Deep Learning
TL;DR: This work aims to predict node failures that occur in supercomputing systems via long short-term memory (LSTM) networks that exploit recurrent neural networks (RNNs), and identifies failure indicators with enhanced training and classification for generic applicability to logs from operating systems and software components without the need to modify them. (via Semantic Scholar)
Source: Web Of Science
Added: April 2, 2019

2018 journal article

KeyValueServe(dagger): Design and performance analysis of a multi-tenant data grid as a cloud service

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 30(14).

By: A. Das n, A. Iyengar* & F. Mueller n

author keywords: cloud computing; data-grid; in-memory; key-value store; multi-tenancy; NoSQL; performance; quality of service
TL;DR: This paper presents KeyValueServe, a low overhead cloud service with features aiding resource management that can efficiently provide services to tenants without degrading performance, and indicates that a Hazelcast cluster can get congested with multiple concurrent connections when processing client requests, resulting in poor performance. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2018 article

PShiter: Feedback-based Dynamic Power Shiting within HPC Jobs for Performance

HPDC '18: PROCEEDINGS OF THE 27TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, pp. 106–117.

By: N. Gholkar n, F. Mueller n, B. Rountree* & A. Marathe*

TL;DR: To the best of the knowledge, PShifter is the first approach to transparently and automatically apply power capping non-uniformly across processors of a job in a dynamic manner adapting to phase changes. (via Semantic Scholar)
UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Source: Web Of Science
Added: April 2, 2019

2018 article

Work-In-Progress: Making Machine Learning Real-Time Predictable

2018 39TH IEEE REAL-TIME SYSTEMS SYMPOSIUM (RTSS 2018), pp. 157–160.

By: H. Xu n & F. Mueller n

author keywords: Edge Computing; Real-time Predictability; Keras; Caffe
TL;DR: This work identifies the subset of ML problems appropriate for edge devices by investigating if they result in real-time predictable services for a set of widely used ML libraries, and enhances the Caffe library to make it more suitable for real- time predictability. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Source: Web Of Science
Added: March 18, 2019

2017 journal article

DINO: Divergent node cloning for sustained redundancy in HPC

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 109, 350–362.

By: A. Rezaei n, F. Mueller n, P. Hargrove* & E. Roman*

author keywords: Fault tolerance; High performance computing; Node cloning; Redundant computing
UN Sustainable Development Goal Categories
16. Peace, Justice and Strong Institutions (OpenAlex)
Source: Web Of Science
Added: August 6, 2018

2017 article

ScalaIOExtrap: Elastic I/O Tracing and Extrapolation

2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), pp. 585–594.

TL;DR: An I/O tracing framework with a mathematical model to analyze trace data and extrapolate it to larger number of nodes, a replay engine for the extrapolated trace file to verify its accuracy and a combination of synthetic benchmarks on all platforms is contributed. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2017 journal article

Scalable communication event tracing via clustering

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 109, 230–244.

By: A. Bahmani n & F. Mueller n

author keywords: Clustering algorithms; Programming techniques; Concurrent programming; Performance measurement
TL;DR: An adaptive clustering algorithm for large-scale applications called ACURDION is devised that traces the MPI communication of code with O(log P) time complexity and improves trace scalability and automation over prior approaches. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Source: Web Of Science
Added: August 6, 2018

2016 article

A Power-aware Cost Model for HPC Procurement

2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), pp. 1110–1113.

By: N. Gholkar n, F. Mueller n & B. Rountree*

TL;DR: This work contributes a procurement model to aid in the design of a capability system that achieves maximum performance while considering manufacturing variations, and appropriately partitions a single, compound system budget into the CAPEX (infrastructure cost) and the OPEX (operating power cost). (via Semantic Scholar)
UN Sustainable Development Goal Categories
9. Industry, Innovation and Infrastructure (Web of Science; OpenAlex)
Source: Web Of Science
Added: August 6, 2018

2016 conference paper

A resilient software infrastructure for wide-area measurement systems

2016 ieee power and energy society general meeting (pesgm).

By: T. Qian*, H. Xu*, J. Zhang n, A. Chakrabortty n, F. Mueller* & Y. Xin*

TL;DR: This work designs and implements a software infrastructure to estimate power grid oscillation modes based on real-time data collected from Phasor Measurement Units (PMUs), and deploys a distributed algorithm on the basis of the Prony algorithm and the Alternating Directions Method of Multipliers (ADMM). (via Semantic Scholar)
UN Sustainable Development Goal Categories
9. Industry, Innovation and Infrastructure (OpenAlex)
Sources: NC State University Libraries, NC State University Libraries
Added: August 6, 2018

2016 article

Benchmark Generation and Simulation at Extreme Scale

2016 IEEE/ACM 20TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL TIME APPLICATIONS (DS-RT), pp. 9–18.

By: M. Lagadapati n, F. Mueller n & C. Engelmann*

TL;DR: This work focuses on extreme-scale simulation of HPC applications and their communication behavior via lightweight parallel discrete event simulation for performance estimation and evaluation and promotes the generation of a benchmark from traces. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2016 chapter

Distributed Job Allocation for Large-Scale Manycores

In Lecture Notes in Computer Science (pp. 404–425).

By: S. Ramachandran n & F. Mueller n

TL;DR: Results show sparse job allocations to incur lower overhead for active cancellation while sequencer-based atomic broadcast has less overhead for denser allocations. (via Semantic Scholar)
UN Sustainable Development Goal Categories
8. Decent Work and Economic Growth (OpenAlex)
Source: Crossref
Added: February 24, 2020

2016 conference paper

Distributed job allocation for large-scale manycores

High performance computing, 9697, 404–425.

By: S. Ramachandran & F. Mueller

Source: NC State University Libraries
Added: August 6, 2018

2016 chapter

Efficient and Predictable Group Communication for Manycore NoCs

In Lecture Notes in Computer Science (pp. 383–403).

By: K. Yagna n, O. Patil n & F. Mueller n

TL;DR: This research presents a parallel NoC architecture for manycore embedded processors that provides native core-to-core communication that can be exploited via message passing to provide system scalability and predictability challenges. (via Semantic Scholar)
Source: Crossref
Added: February 24, 2020

2016 conference paper

Efficient and predictable group communication for manycore NoCs

High performance computing, 9697, 383–403.

By: K. Yagna, O. Patil & F. Mueller

Source: NC State University Libraries
Added: August 6, 2018

2016 journal article

Efficient clustering for ultra-scale application tracing

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 98, 25–39.

By: A. Bahmani n & F. Mueller n

author keywords: Clustering algorithms; Programming techniques; Concurrent programming; Performance measurement
TL;DR: This work contributes a fast, scalable, signature-based clustering algorithm that clusters processes exhibiting similar execution behavior that combines low overhead at the clustering level with l o g ( P ) time complexity, and it splits the merge process to make tracing suitable for extreme-scale computing. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2016 journal article

Exploiting data representation for fault tolerance

Journal of Computational Science, 14, 51–60.

By: J. Elliott n, M. Hoemmen* & F. Mueller n

author keywords: Algorithm-based fault tolerance; Resilient algorithms; Numerical methods
UN Sustainable Development Goal Categories
16. Peace, Justice and Strong Institutions (OpenAlex)
Sources: Crossref, NC State University Libraries
Added: August 6, 2018

2016 article

FlipSphere: A Software-based DRAM Error Detection and Correction Library for HPC

2016 IEEE/ACM 20TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL TIME APPLICATIONS (DS-RT), pp. 19–28.

By: D. Fiala n, F. Mueller n & K. Ferreira*

TL;DR: FlipSphere is introduced, a tunable, transparent silent data corruption detection and correction library for HPC applications that is first in its class to use hardware accelerators, such as the Intel Xeon Phi MIC, to increase application resilience. (via Semantic Scholar)
UN Sustainable Development Goal Categories
16. Peace, Justice and Strong Institutions (OpenAlex)
Source: Web Of Science
Added: August 6, 2018

2016 conference paper

Hybrid MPI/OpenMP programming on the Tilera manycore architecture

2016 International Conference on High Performance Computing & Simulation (HPCS 2016), 326–333.

By: V. Chandru n & F. Mueller n

TL;DR: This work assesses the viability of different programming models for large-scale manycores using an MPI-like abstraction, the vendor's OpenMP, and a combination (hybrid) of both and finds that MPI and OpenMP both scale while the hybrid model performs inferior to the others. (via Semantic Scholar)
Source: NC State University Libraries
Added: August 6, 2018

2016 conference paper

Performance analysis of a multi-tenant in-memory data grid

Proceedings of 2016 ieee 9th international conference on cloud computing (cloud), 956–959.

By: A. Das n, F. Mueller n, X. Gu n & A. Iyengar

TL;DR: This study suggests that processing increasing number of client requests spawning fewer number of threads help improve performance, and uncovers scenarios of performance degradation followed by optimized performance via end-point multiplexing. (via Semantic Scholar)
UN Sustainable Development Goal Categories
9. Industry, Innovation and Infrastructure (OpenAlex)
Source: NC State University Libraries
Added: August 6, 2018

2016 chapter

Reducing NoC and Memory Contention for Manycores

In Architecture of Computing Systems – ARCS 2016 (pp. 293–305).

By: V. Chandru n & F. Mueller n

TL;DR: Experiments show that targeted memory allocation results in reduced execution times and NoC contention, the latter of which has not been studied before at this scale. (via Semantic Scholar)
Source: Crossref
Added: February 24, 2020

2016 conference paper

Sensitivity analysis for a quantum informed ferroelectric energy model

Proceedings of the asme conference on smart materials adaptive.

By: L. Leon n, R. Smith n, W. Oates* & P. Miles*

UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Source: NC State University Libraries
Added: August 6, 2018

2016 article

SparkScore: Leveraging Apache Spark for Distributed Genomic Inference

2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), pp. 435–442.

By: A. Bahmani n, A. Sibley*, M. Parsian*, K. Owzar* & F. Mueller n

TL;DR: SparkScore, a set of distributed computational algorithms implemented in Apache Spark, is proposed to leverage the embarrassingly parallel nature of genomic resampling inference on the basis of the efficient score statistics and harnesses the fault-tolerant features of Spark. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Source: Web Of Science
Added: August 6, 2018

2016 article

TintMalloc: Reducing Memory Access Divergence via Controller-Aware Coloring

2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), pp. 363–372.

By: X. Pan n, Y. Gownivaripalli n & F. Mueller n

author keywords: NUMA; caches; memory controller; page coloring
TL;DR: Experimental results with the SPEC and Parsec benchmarks show that by choosing disjoint colors per thread, locality is increased, contention is decreased, and overall SPMD execution becomes more balanced at barriers than default memory allocation under Linux as well as prior coloring approaches. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2015 journal article

A fine-grained block ILU scheme on regular structures for GPGPUs

COMPUTERS & FLUIDS, 119, 149–161.

By: L. Luo n, J. Edwards n, H. Luo n & F. Mueller n

author keywords: Block ILU; Block-sparse linear systems; Wavefront scheme; GPGPU; OpenACC; CUDA
TL;DR: A fine-grained BILU (FGBILU) scheme which is particularly effective on GPGPUs and has been implemented with both OpenACC and CUDA and tested as a block-sparse linear solver on a structured 3D grid. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2015 journal article

Architecture aware semi partitioned real-time scheduling on multicore platforms

REAL-TIME SYSTEMS, 51(3), 274–313.

By: M. Shekhar*, H. Ramaprasad*, A. Sarkar n & F. Mueller n

author keywords: Real-time scheduling; Semi-partitioned; Multi-core
TL;DR: A predictable semi-partitioned scheduling algorithm for independent hard-real-time sporadic tasks executing on homogeneous multicore platforms using cache locking and locked cache migration is presented. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Source: Web Of Science
Added: August 6, 2018

2015 article

DINO: Divergent Node Cloning for Sustained Redundancy in HPC

2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, pp. 180–183.

By: A. Rezaei n & F. Mueller n

TL;DR: Experimental results indicate that DINO can recover from failures nearly instantaneously, thus retaining the redundancy level throughout job execution, and the design and implementation for repairing failed replicas in redundant MPI computing is unprecedented. (via Semantic Scholar)
UN Sustainable Development Goal Categories
16. Peace, Justice and Strong Institutions (OpenAlex)
Source: Web Of Science
Added: August 6, 2018

2015 article

Evaluation of Memory Access Arbitration Algorithm on Tilera's TILEPro64 platform

2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS), pp. 1154–1159.

By: M. Shekhar*, H. Ramaprasad* & F. Mueller n

TL;DR: This paper implements and evaluates variants of an arbitration policy for memory access requests over a Network-on-Chip platform, namely Tilera's TilePro64 platform, and evaluates its application in real-time embedded systems. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Source: Web Of Science
Added: August 6, 2018

2015 article

Hybrid EDF Packet Scheduling for Real-Time Distributed Systems

PROCEEDINGS OF THE 2015 27TH EUROMICRO CONFERENCE ON REAL-TIME SYSTEMS (ECRTS 2015), pp. 37–46.

By: T. Qian n, F. Mueller n & Y. Xin n

TL;DR: This work combines EDF scheduling with periodic message transmission tasks, and implements an EDF-based packet scheduler, which transmits packets considering event deadlines, in a real-time distributed storage system. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2015 article

Intrusion Detection for CPS Real-Time Controllers

CYBER PHYSICAL SYSTEMS APPROACH TO SMART ELECTRIC POWER GRID, pp. 329–358.

By: C. Zimmer n, B. Bhat n, F. Mueller n & S. Mohan*

TL;DR: This work presents a set of mechanisms for timebased intrusion detection, i.e., the execution of unauthorized instructions in realtime CPS environments, and develops techniques to detect intrusions in a self-checking manner by the application and through the operating system scheduler, which are novel contributions to the real-time/embedded systems domain. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2015 journal article

NoCMsg: A Scalable Message-Passing Abstraction for Network-on-Chips

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 12(1).

By: C. Zimmer n & F. Mueller n

author keywords: Experimentation; Tracing; Compression; Multicore architectures; shared memory; message passing
TL;DR: This work contributes NoCMsg, a low-level message-passing abstraction over NoCs, which is specifically designed for large core counts in 2D meshes, and observes that shared memory scales up to about 16 cores on this platform, whereas message passing performs well beyond that threshold. (via Semantic Scholar)
UN Sustainable Development Goal Categories
16. Peace, Justice and Strong Institutions (OpenAlex)
Source: Web Of Science
Added: August 6, 2018

2015 journal article

OpenACC acceleration of an unstructured CFD solver based on a reconstructed discontinuous Galerkin method for compressible flows

International Journal for Numerical Methods in Fluids, 78(3), 123–139.

By: Y. Xia n, J. Lou n, H. Luo n, J. Edwards n & F. Mueller n

author keywords: GPU computing; OpenACC; CUDA; discontinuous Galerkin; compressible flow; Navier-Stokes equations
TL;DR: The numerical results indicate that this OpenACC‐based parallel scheme is an effective and extensible approach to port unstructured high‐order CFD solvers to GPU computing. (via Semantic Scholar)
Sources: Web Of Science, Crossref
Added: August 6, 2018

2015 conference paper

Providing task isolation via TLB coloring

21st IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2015), 3–13.

By: S. Panchamukhi n & F. Mueller n

TL;DR: This work design and implement a new heap allocator that guarantees the TLB set, which will hold a particular page translation on a uniprocessor of a contemporary architecture, based on the concept of page coloring, a software TLB partitioning method. (via Semantic Scholar)
Source: NC State University Libraries
Added: August 6, 2018

2015 journal article

Reliable and scalable communication for the power grid

Cyber Physical Systems Approach to Smart Electric Power Grid, 195–217.

By: C. Zimmer n & F. Mueller n

TL;DR: Existence and discovery of multi-route pathways is essential in ensuring delivery of critical data in future smart power grids. (via Semantic Scholar)
UN Sustainable Development Goal Categories
9. Industry, Innovation and Infrastructure (OpenAlex)
Source: NC State University Libraries
Added: August 6, 2018

2015 journal article

Static Task Partitioning for Locked Caches in Multicore Real-Time Systems

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 14(1).

By: A. Sarkar n, F. Mueller n & H. Ramaprasad*

author keywords: Design; Experimentation; Real-time systems; multicore architectures; timing analysis
TL;DR: Overall, this work is unique in considering the challenges of future multicore architectures for real-time systems and provides key insights into task partitioning and cache-locking mechanisms for architectures with private caches. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2014 conference paper

A real-time distributed hash table

2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA).

By: T. Qian n, F. Mueller n & Y. Xin*

TL;DR: An experimental evaluation on distributed nodes shows that the proposed real-time DHT model is well suited to provide time bounds for requests following typical workload patterns and that a prioritized extension can increase the probability of meeting deadlines for subsequent requests. (via Semantic Scholar)
UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Source: NC State University Libraries
Added: August 6, 2018

2014 conference paper

A real-time distributed storage system for multi-resolution virtual synchrophasor

2014 IEEE PES General Meeting | Conference & Exposition. Presented at the 2014 IEEE Power & Energy Society General Meeting.

By: T. Qian*, A. Chakrabortty n, F. Mueller* & Y. Xin*

Event: 2014 IEEE Power & Energy Society General Meeting

TL;DR: This work designs and implements a real-time distributed storage system to support the virtual PMU data communication, and extends the Chord algorithm so that the response time of data communication can be bounded by the storage system. (via Semantic Scholar)
UN Sustainable Development Goal Categories
9. Industry, Innovation and Infrastructure (OpenAlex)
Sources: NC State University Libraries, Crossref, NC State University Libraries
Added: August 6, 2018

2014 article

Evaluating the Impact of SDC on the GMRES Iterative Solver

2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM.

By: J. Elliott n, M. Hoemmen* & F. Mueller n

TL;DR: This work derives inexpensive checks to detect the effects of an SDC in GMRES that work for a more general SDC model than presuming a bit flip, and shows that when GMRES is used as the inner solver of an inner-outer iteration, it can "run through" SDC of almost any magnitude in the computationally intensive orthogonalization phase. (via Semantic Scholar)
UN Sustainable Development Goal Categories
16. Peace, Justice and Strong Institutions (OpenAlex)
Source: Web Of Science
Added: August 6, 2018

2014 article

NoCMsg: Scalable NoC-Based Message Passing

2014 14TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), pp. 186–195.

By: C. Zimmer n & F. Mueller n

TL;DR: This work contributes NoCMsg, a low-level message passing abstraction over NoC that ensures deadlock free messaging for wormhole Manhattan-path routing over the NoC, and is the first head-on comparison of shared memory and advanced message passing specifically designed for NoCs on an actual hardware platform with larger core counts on a single socket. (via Semantic Scholar)
UN Sustainable Development Goal Categories
16. Peace, Justice and Strong Institutions (OpenAlex)
Source: Web Of Science
Added: August 6, 2018

2014 chapter

ScalaJack: Customized Scalable Tracing with In-situ Data Analysis

In Lecture Notes in Computer Science (pp. 13–25).

By: S. Ananthakrishnan n & F. Mueller n

TL;DR: This work addresses root cause diagnosis of large-scale HPC applications often fails by combining customized tracing and providing support for in-situ data analysis via ScalaJack, a framework with customizable instrumentation and pluggable extension capabilities for problem directedstrumentation and in-site data analysis. (via Semantic Scholar)
Source: Crossref
Added: February 24, 2020

2014 chapter

Tools for Simulation and Benchmark Generation at Exascale

In Tools for High Performance Computing 2013 (pp. 19–24).

By: M. Lagadapati n, F. Mueller n & C. Engelmann*

TL;DR: This work focuses on extreme-scale simulation of millions of Message Passing Interface ranks using a lightweight parallel discrete event simulation (PDES) toolkit for performance evaluation and generates a benchmark from it and runs this benchmark within a simulation using models to reflect the performance characteristics of future-generation HPC systems. (via Semantic Scholar)
Source: Crossref
Added: February 24, 2020

2014 conference paper

Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs

Ieee international symposium on performance analysis of systems and, 231–241.

By: C. Li, Y. Yang, H. Dai, S. Yan, F. Mueller & H. Zhou

Source: NC State University Libraries
Added: August 6, 2018

2013 journal article

Autogeneration and Autotuning of 3D Stencil Codes on Homogeneous and Heterogeneous GPU Clusters

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 24(3), 417–427.

By: Y. Zhang n & F. Mueller n

author keywords: Accelerators; GPGPU programming; stencil codes; GPU clusters
TL;DR: This proposed framework takes a most concise specification of stencil behavior from the user as a single formula, autogenerates tunable code from it, systematically searches for the best configuration and generates the code with optimal parameter configurations for different GPUs. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2013 article

Best papers, IPDPS 2011

Mueller, F. (2013, July). JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, Vol. 73, pp. 939–939.

By: F. Mueller n

TL;DR: This special issue is a follow-on of the 2011 International Parallel and Distributed Processing Symposium and gathers extended versions of three of the four best papers who decided to submit an extended version to this journal. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2013 conference paper

HiDP: A hierarchical data parallel language

Proceedings of the 2013 ieee/acm international symposium on code generation and optimization (cgo), 171–181.

By: Y. Zhang & F. Mueller

Source: NC State University Libraries
Added: August 6, 2018

2012 chapter

A Tunable, Software-Based DRAM Error Detection and Correction Library for HPC

In Euro-Par 2011: Parallel Processing Workshops (pp. 251–261).

TL;DR: LIBSDC is introduced, a tunable, transparent silent data corruption detection and correction library for HPC applications that provides comprehensive SDC protection for program memory by implementing on-demand page integrity verification. (via Semantic Scholar)
UN Sustainable Development Goal Categories
16. Peace, Justice and Strong Institutions (OpenAlex)
Source: Crossref
Added: August 28, 2020

2012 article

Combining Partial Redundancy and Checkpointing for HPC

2012 IEEE 32ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), pp. 615–626.

By: J. Elliott n, K. Kharbas n, D. Fiala n, F. Mueller n, K. Ferreira* & C. Engelmann*

TL;DR: This work contributes a model and analyzes the benefit of C/R in coordination with redundancy at different degrees to minimize the total wallclock time and resources utilization of HPC applications and conducts experiments with an implementation of redundancy within the MPI layer on a cluster. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2012 conference paper

Detection and correction of silent data corruption for large-scale high-performance computing

International conference for high performance computing networking.

By: D. Fiala n, F. Mueller n, C. Engelmann*, R. Riesen, K. Ferreira* & R. Brightwell*

UN Sustainable Development Goal Categories
16. Peace, Justice and Strong Institutions (OpenAlex)
Source: NC State University Libraries
Added: August 6, 2018

2012 article

Low ContentionMapping of Real-Time Tasks onto a TilePro 64 Core Processor

2012 IEEE 18TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS), pp. 131–140.

By: C. Zimmer n & F. Mueller n

TL;DR: This is the first work to consider IPC for worst-case time frames to simplify analysis and to measure the impact on actual hardware for NoC-based real-time multi core systems. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2012 journal article

Memory Trace Compression and Replay for SPMD Systems Using Extended PRSDs

COMPUTER JOURNAL, 55(2), 206–217.

By: S. Budanur n, F. Mueller n & T. Gamblin*

TL;DR: Sc ScalaMemTrace is presented, a novel technique for collecting memory traces in a scalable manner that builds on prior trace methods with aggressive compression techniques to allow lossless representation of memory traces for dense algebraic kernels, with near-constant trace size irrespective of the problem size or the number of threads. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2012 journal article

Proactive process-level live migration and back migration in HPC environments

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 72(2), 254–267.

By: C. Wang n, F. Mueller n, C. Engelmann* & S. Scott*

author keywords: Live migration; Back migration; Fault tolerance; High-performance computing; Health monitoring
TL;DR: A novel process-level live migration mechanism supports continued execution of applications during much of process migration and provides a novel back migration approach to eliminate load imbalance or bottlenecks caused by migrated tasks. (via Semantic Scholar)
UN Sustainable Development Goal Categories
10. Reduced Inequalities (OpenAlex)
Source: Web Of Science
Added: August 6, 2018

2012 journal article

SCALAEXTRAP: Trace-Based Communication Extrapolation for SPMD Programs

ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 34(1).

By: X. Wu n & F. Mueller n

author keywords: Communication; tracing; compression; trace extrapolation
TL;DR: An innovative approach for topology extrapolation of single program, multiple data (SPMD) codes with stencil or mesh communication is devised, which has the potential to enable otherwise infeasible system simulation at the exascale level. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Source: Web Of Science
Added: August 6, 2018

2012 article

ScalaBenchGen: Auto-Generation of Communication Benchmarks Traces

2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), pp. 1250–1260.

By: X. Wu n, V. Deshpande n & F. Mueller n

TL;DR: Sc Scala Trace is utilized, a loss less and scalable framework to trace communication operations and execution time while abstracting away the computations of an MPI application, and generated source code of benchmarks preserves both the communication patterns and the wall clock-time behavior of the original application. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2012 chapter

ScalaTrace: Tracing, Analysis and Modeling of HPC Codes at Scale

In Applied Parallel and Scientific Computing (pp. 410–418).

TL;DR: This work introduces intra- and inter-node compression techniques of MPI events, develops a scheme to preserve time and causality of communication events, and presents results of the implementation for BlueGene/L. (via Semantic Scholar)
Source: Crossref
Added: August 28, 2020

2012 conference paper

ScalaTrace: Tracing, analysis and modeling of HPC codes at scale

Applied parallel and scientific computing, pt ii, 7134, 410–418.

By: F. Mueller, X. Wu, M. Schulz, B. Supinski & T. Gamblin

Source: NC State University Libraries
Added: August 6, 2018

2012 conference paper

Static task partitioning for locked caches in multi-core real-time systems

Cases'12: proceedings of the 2012 ACM International Conference on Compilers, Architectures and Synthesis for Embedded Systems, 161–170.

By: A. Sarkar n, F. Mueller n & H. Ramaprasad*

TL;DR: This work is unique in considering the challenges of future multi-core architectures for real-time systems and provides key insights into task partitioning with locked caches for architectures with private caches. (via Semantic Scholar)
Source: NC State University Libraries
Added: August 6, 2018

2011 journal article

Data-intensive document clustering on graphics processing unit (GPU) clusters

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 71(2), 211–224.

By: Y. Zhang n, F. Mueller n, X. Cui* & T. Potok*

author keywords: High-performance computing; Accelerators; Data-intensive computing
TL;DR: The benefits of exploiting the computational power of graphics processing units to study two fundamental problems in document mining, namely to calculate the term frequency-inverse document frequency (TF-IDF) and cluster a large set of documents are assessed. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Source: Web Of Science
Added: August 6, 2018

2011 journal article

Making DRAM refresh predictable

REAL-TIME SYSTEMS, 47(5), 430–453.

By: B. Bhat n & F. Mueller n

author keywords: Real-time systems; DRAM; Worst-case execution time; Timing analysis; DRAM refresh; Timing predictability
Source: Web Of Science
Added: August 6, 2018

2011 journal article

Predictable Task Migration for Locked Caches in Multi-Core Systems

ACM SIGPLAN NOTICES, 46(5), 131–140.

By: A. Sarkar n, F. Mueller n & H. Ramaprasad*

author keywords: Design; Experimentation; Real-Time Systems; Multi-Core Architectures; Timing Analysis; Task Migration
UN Sustainable Development Goal Categories
10. Reduced Inequalities (OpenAlex)
Source: Web Of Science
Added: August 6, 2018

2011 conference paper

Predictable task migration for locked caches in multi-core systems

LCTES 11: Proceedings of the ACM Sigplan/Sigbed 2011 Conference on Languages, Complilers, Tools and Theory for Embedded Systems, 131–140.

By: A. Sarkar n, F. Mueller n & H. Ramaprasad*

TL;DR: The push-assisted migration model is extended with several cache migration techniques to efficiently retain locked cache lines on a bus-based chip multi-processor architecture and deterministic migration delay bounds are provided that help the scheduler decide which migration technique(s) to utilize to relocate a single or multiple tasks. (via Semantic Scholar)
UN Sustainable Development Goal Categories
10. Reduced Inequalities (OpenAlex)
Source: NC State University Libraries
Added: August 6, 2018

2011 article

ScalaExtrap: Trace-Based Communication Extrapolation for SPMD Programs

Wu, X., & Mueller, F. (2011, August). ACM SIGPLAN NOTICES, Vol. 46, pp. 113–122.

By: X. Wu n & F. Mueller n

author keywords: High-Performance Computing; Message Passing; Tracing; Performance Prediction; Measurement; Performance
Source: Web Of Science
Added: August 6, 2018

2010 journal article

Feedback-directed page placement for ccNUMA via hardware-generated memory traces

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 70(12), 1204–1219.

By: J. Marathe n, V. Thakkar n & F. Mueller n

author keywords: Hardware performance monitoring; NUMA; Trace guided optimization; Page placement
TL;DR: Experiments show that this method, although based on lossy tracing, can efficiently and effectively improve page placement, leading to an average wall-clock execution time saving of over 20% for the tested benchmarks. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2010 journal article

Parametric Timing Analysis and Its Application to Dynamic Voltage Scaling

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 10(2).

By: S. Mohan n, F. Mueller n, M. Root*, W. Hawkins*, C. Healy*, D. Whalley*, E. Vivancos*

author keywords: Algorithms; Experimentation; Real-time systems; worst-case execution time; timing analysis; dynamic voltage scaling
UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Source: Web Of Science
Added: August 6, 2018

2010 journal article

Tightening the Bounds on Feasible Preemptions

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 10(2).

By: H. Ramaprasad* & F. Mueller n

author keywords: Algorithms; Experimentation; Real-time systems; preemptions; worst-case execution time; timing analysis; data caches; cache-related preemption delay
TL;DR: A method to calculate tight upper bounds on the maximum number of possible preemptions for each job of a task and, considering the worst-case placement of these preemption points, derive a much tighter bound on its WCET, showing significant improvements in the bounds derived. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2009 article

A Tunable Holistic Resiliency Approach for High-Performance Computing Systems

Scott, S. L., Engelmann, C., Vallee, G. R., Naughton, T., Tikotekar, A., Ostrouchov, G., … Varma, J. (2009, April). ACM SIGPLAN NOTICES, Vol. 44, pp. 305–306.

By: S. Scott*, C. Engelmann*, G. Vallee*, T. Naughton*, A. Tikotekar*, G. Ostrouchov*, C. Leangsuksun*, N. Naksinehaboon* ...

author keywords: Design; Measurement; Performance; Reliability
Source: Web Of Science
Added: August 6, 2018

2009 journal article

Improving the availability of supercomputer job input data using temporal replication

Computer Science - Research and Development, 23(3-4), 149–157.

By: C. Wang n, Z. Zhang n, X. Ma n, S. Vazhkudai & F. Mueller n

author keywords: Temporal replication; Batch job scheduler; Reliability; Supercomputer; Parallel file system
TL;DR: The temporal replication scheme in the popular Lustre parallel file system is implemented and results show that the scheme allows for fast online data reconstruction, with a reasonably low overall space and I/O bandwidth overhead. (via Semantic Scholar)
UN Sustainable Development Goal Categories
8. Decent Work and Economic Growth (OpenAlex)
Source: Crossref
Added: August 28, 2020

2009 article

Push-Assisted Migration of Real-Time Tasks in Multi-Core Processors

Sarkar, A., Mueller, F., Ramaprasad, H., & Mohan, S. (2009, July). ACM SIGPLAN NOTICES, Vol. 44, pp. 80–89.

By: A. Sarkar n, F. Mueller n, H. Ramaprasad* & S. Mohan*

author keywords: Design; Experimentation; Real-Time Systems; Multi-Core Architectures; Timing Analysis; Task Migration
TL;DR: This research presents a meta-modelling architecture that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of integrating multiple processors into a single system. (via Semantic Scholar)
UN Sustainable Development Goal Categories
10. Reduced Inequalities (OpenAlex)
Source: Web Of Science
Added: August 6, 2018

2009 article

ScalaTrace: Scalable compression and replay of communication traces for high-performance computing

Noeth, M., Ratn, P., Mueller, F., Schulz, M., & Supinski, B. R. (2009, August). JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, Vol. 69, pp. 696–710.

By: M. Noeth n, P. Ratn n, F. Mueller n, M. Schulz* & B. Supinski*

author keywords: High-performance computing; Scalability; Communication tracing
TL;DR: An approach is contributed that provides orders of magnitude smaller, if not near-constant size, communication traces regardless of the number of nodes while preserving structural information. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2008 journal article

Exploiting synchronous and asynchronous DVS for feedback EDF scheduling on an embedded platform

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 7(1).

By: Y. Zhu n & F. Mueller n

author keywords: algorithms; experimentation; real-time systems; scheduling; dynamic voltage scaling; feedback control
TL;DR: This work develops power-aware feedback--DVS algorithms for hard real-time systems that adapt to dynamically changing workloads and studies energy consumption for synchronous and asynchronous DVS switching on a PowerPC board. (via Semantic Scholar)
UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Source: Web Of Science
Added: August 6, 2018

2008 review

The worst-case execution-time problem - Overview of methods and survey of tools

[Review of ]. ACM Transactions on Embedded Computing Systems, 7(3).

By: R. Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S. Thesing, D. Whalley, G. Bernat, C. Ferdinand ...

Source: NC State University Libraries
Added: August 6, 2018

2007 article

DVSleak: Combining leakage reduction and voltage scaling in feedback EDF scheduling

Zhu, Y., & Mueller, F. (2007, July). ACM SIGPLAN NOTICES, Vol. 42, pp. 31–40.

By: Y. Zhu n & F. Mueller n

author keywords: real-time systems; scheduling; dynamic voltage scaling; leakage; feedback control; algorithms; experimentation
UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Source: Web Of Science
Added: August 6, 2018

2007 article

Generalizing parametric timing analysis

Coffman, J., Healy, C., Mueller, F., & Whalley, D. (2007, July). ACM SIGPLAN NOTICES, Vol. 42, pp. 152–154.

By: J. Coffman*, C. Healy*, F. Mueller n & D. Whalley*

author keywords: verification; reliability; worst-case execution time (WCET) analysis; parametric timing analysis
UN Sustainable Development Goal Categories
Source: Web Of Science
Added: August 6, 2018

2007 journal article

METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies

ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 29(2).

By: J. Marathe n, F. Mueller n, T. Mohan*, S. McKee*, B. De Supinski* & A. Yoo*

author keywords: algorithms; languages; performance; dynamic binary rewriting; program instrumentation; data trace generation; data trace compression; cache analysis
TL;DR: This work presents METRIC, a software framework for isolating and understanding memory access bottlenecks using partial access traces, and demonstrates how this information can be used to isolate and understand memory access inefficiencies. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2007 journal article

Source-code-correlated cache coherence characterization of OpenMP benchmarks

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 18(6), 818–834.

By: J. Marathe n & F. Mueller n

author keywords: cache memories; simulation; dynamic binary rewriting; program instrumentation; SMPs; coherence protocols
TL;DR: This paper develops a coherence analysis framework based on incremental coherence simulation of actual reference traces, and shows that cache coherence traffic can be simulated with a considerable degree of accuracy for SPMD programs, as the invalidation traffic closely matches the corresponding hardware performance counters. (via Semantic Scholar)
UN Sustainable Development Goal Categories
16. Peace, Justice and Strong Institutions (OpenAlex)
Source: Web Of Science
Added: August 6, 2018

2006 journal article

FAST: Frequency-Aware Static Timing Analysis

ACM Transactions on Programming Languages and Systems, 5(1), 200–224.

By: K. Seth, A. Anantaraman, F. Mueller & E. Rotenberg

Source: NC State University Libraries
Added: August 6, 2018

2006 journal article

Improving WCET by applying worst-case path optimizations

REAL-TIME SYSTEMS, 34(2), 129–152.

By: W. Zhao*, W. Kreahling*, D. Whalley*, C. Healy* & F. Mueller n

author keywords: WCET; path-based optimizations; embedded systems
TL;DR: This paper describes an approach to reduce the WCET by adapting and applying optimizations designed for frequent paths to the worst-case (WC) paths in an application and uses feedback from a timing analyzer to detect the WC paths in a function. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2005 article

Feedback EDF scheduling exploiting hardware-assisted asynchronous dynamic voltage scaling

Zhu, Y. F., & Mueller, F. (2005, July). ACM SIGPLAN NOTICES, Vol. 40, pp. 203–212.

By: Y. Zhu n & F. Mueller n

UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Source: Web Of Science
Added: August 6, 2018

2005 journal article

Feedback EDF scheduling of real-time tasks exploiting dynamic voltage scaling

REAL-TIME SYSTEMS, 31(1-3), 33–63.

By: Y. Zhu n & F. Mueller n

author keywords: real-time systems; scheduling; dynamic voltage scaling; feedback control
TL;DR: A feedback control model is given to describe the feedback DVS scheduler, which is used to analyze the system's stability and the ability of the algorithm to save up to 29% more energy than previous work for task sets with different dynamic workload characteristics. (via Semantic Scholar)
UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Source: Web Of Science
Added: August 6, 2018

2004 article

Compositional static instruction cache simulation

Patil, K., Seth, K., & Mueller, F. (2004, July). ACM SIGPLAN NOTICES, Vol. 39, pp. 136–145.

By: K. Patil, K. Seth* & F. Mueller n

author keywords: algorithms; experimentation; real-time systems; caches; scheduling; worst-case execution time
UN Sustainable Development Goal Categories
16. Peace, Justice and Strong Institutions (OpenAlex)
Source: Web Of Science
Added: August 6, 2018

2004 article

Enforcing safety of real-time schedules on contemporary processors using a virtual simple architecture (VISA)

25TH IEEE INTERNATIONAL REAL-TIME SYSTEMS SYMPOSIUM, PROCEEDINGS, pp. 114–125.

By: A. Anantaraman n, K. Seth*, E. Rotenberg n & F. Mueller n

TL;DR: A VISA variant is proposed that dynamically accrues the slack needed to facilitate speculation in the complex mode, eliminating the need to statically pad WCETs and thereby enabling VISA-style speculation even in highly-utilized systems. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2004 journal article

Scalable hierarchical locking for distributed systems

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 64(6), 708–724.

By: N. Desai* & F. Mueller*

author keywords: distributed mutual exclusion; middleware services; distributed resource allocation; concurrency services; hierarchical locking; peer-to peer protocols; scalability; large-scale distributed computing; distributed agreement; distributed transactions
TL;DR: The objective of the work is to enhance middleware services to provide scalability of synchronization and to support state replication in distributed systems, and designed and implemented a middleware protocol that is a peer-to-peer protocol for multi-mode hierarchical locking, applicable to transaction-style processing and distributed agreement. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2003 article

Communication characteristics of large-scale scientific applications for contemporary cluster architectures

Vetter, J. S., & Mueller, F. (2003, September). JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, Vol. 63, pp. 853–865.

By: J. Vetter* & F. Mueller n

TL;DR: This paper examines the explicit communication characteristics of several sophisticated scientific applications, which, by themselves, constitute a representative suite of publicly available benchmarks for large cluster architectures by focusing on the Message Passing Interface (MPI) and by using hardware counters on the microprocessor. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2003 article

FAST: Frequency-aware static timing analysis

RTSS 2003: 24TH IEEE INTERNATIONAL REAL-TIME SYSTEMS SYMPOSIUM, PROCEEDINGS, pp. 40–51.

By: K. Seth n, A. Anantaraman n, F. Mueller n & E. Rotenberg n

TL;DR: Novel techniques for tight and flexible static timing analysis particularly well-suited for dynamic scheduling schemes are contributed, including a parametric approach towards bounding the WCET statically with respect to the frequency and an improved parametric model for improving existing DVS scheduling schemes. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2003 conference paper

Virtual Simple Architecture (VISA): Exceeding the complexity limit in safe real-time systems

Computers and their applications :|bproceedings of the ISCA 16th International Conference, Seattle, Washington, USA, March 28-30, 2001, 350–361. Cary, NC: ISCA.

By: A. Anantaraman, K. Seth, K. Patil, E. Rotenberg & F. F. Mueller

Source: NC State University Libraries
Added: August 6, 2018

2002 article

Energy-conserving feedback EDF scheduling for embedded systems with real-time constraints

Dudani, A., Mueller, F., & Zhu, Y. F. (2002, July). ACM SIGPLAN NOTICES, Vol. 37, pp. 213–222.

By: A. Dudani n, F. Mueller n & Y. Zhu n

author keywords: real-time systems; scheduling; dynamic voltage scaling
UN Sustainable Development Goal Categories
7. Affordable and Clean Energy (OpenAlex)
Source: Web Of Science
Added: August 6, 2018

2002 journal article

Handling irreducible loops: Optimized node splitting versus DJ-Graphs

ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 24(4), 299–333.

By: S. Unger n & F. Mueller n

author keywords: algorithms; languages; code optimization; compilation; control flow graphs; instruction-level parallelism; irreducible flowgraphs; loops; node splitting; reducible flowgraphs
TL;DR: A method of optimized node splitting to transform irreducible regions of control flow into reducible regions is formally defined and its correctness is shown, which is superior to approaches previously published since it reduces the number of replicated nodes by comparison. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Source: Web Of Science
Added: August 6, 2018

2001 journal article

A comparison of static analysis and evolutionary testing for the verification of timing constraints

REAL-TIME SYSTEMS, 21(3), 241–268.

By: J. Wegener* & F. Mueller n

author keywords: real-time systems; timing analysis; static timing analysis; testing; genetic algorithms; evolutionary testing
TL;DR: The results show that static analysis and evolutionary testing are complementary methods, which together provide upper and lower bounds for both worst-case and best-case execution times. (via Semantic Scholar)
UN Sustainable Development Goal Categories
16. Peace, Justice and Strong Institutions (OpenAlex)
Source: Web Of Science
Added: August 6, 2018

2001 book

High-level parallel programming models and supportive environments 6th international workshop, HIPS 2001, San Francisco, CA, USA, April 23, 2001 : proceedings

New York: Springer.

Frank Mueller

Source: NC State University Libraries
Added: August 6, 2018

Citation Index includes data from a number of different sources. If you have questions about the sources of data in the Citation Index or need a set of data which is free to re-distribute, please contact us.

Certain data included herein are derived from the Web of Science© and InCites© (2024) of Clarivate Analytics. All rights reserved. You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.