Works (5)

Updated: July 5th, 2023 15:34

2021 article

Systemic Assessment of Node Failures in HPC Production Platforms

2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), pp. 267–276.

By: A. Das n, F. Mueller n & B. Rountree*

author keywords: Root Cause; Node Failures; Holistic Analysis
TL;DR: It is shown that external environmental influence is not strongly correlated with node failures in terms of the root cause, and lead time enhancements are feasible for nodes showing fail slow characteristics. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Source: Web Of Science
Added: October 4, 2021

2020 article

Aarohi: Making Real-Time Node Failure Prediction Feasible

2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, pp. 1092–1101.

By: A. Das n, F. Mueller n & B. Rountree*

author keywords: Online Prediction; HPC; Node Failures; Parsing
TL;DR: This work tackles online anomaly prediction in computing systems by exploiting context free grammar-based rapid event analysis and presents the framework Aarohi, which describes an effective way to predict failures online. (via Semantic Scholar)
Source: Web Of Science
Added: June 10, 2021

2018 article

Desh: Deep Learning for System Health Prediction of Lead Times to Failure in HPC

HPDC '18: PROCEEDINGS OF THE 27TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, pp. 40–51.

By: A. Das n, F. Mueller n, C. Siegel & A. Vishnu*

author keywords: LSTM; Failure Prediction; Log Mining; HPC; Node Failures; Lead Times; Anomaly Detection; Deep Learning
TL;DR: This work aims to predict node failures that occur in supercomputing systems via long short-term memory (LSTM) networks that exploit recurrent neural networks (RNNs), and identifies failure indicators with enhanced training and classification for generic applicability to logs from operating systems and software components without the need to modify them. (via Semantic Scholar)
Source: Web Of Science
Added: April 2, 2019

2018 journal article

KeyValueServe(dagger): Design and performance analysis of a multi-tenant data grid as a cloud service

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 30(14).

By: A. Das n, A. Iyengar* & F. Mueller n

author keywords: cloud computing; data-grid; in-memory; key-value store; multi-tenancy; NoSQL; performance; quality of service
TL;DR: This paper presents KeyValueServe, a low overhead cloud service with features aiding resource management that can efficiently provide services to tenants without degrading performance, and indicates that a Hazelcast cluster can get congested with multiple concurrent connections when processing client requests, resulting in poor performance. (via Semantic Scholar)
Source: Web Of Science
Added: August 6, 2018

2016 conference paper

Performance analysis of a multi-tenant in-memory data grid

Proceedings of 2016 ieee 9th international conference on cloud computing (cloud), 956–959.

By: A. Das n, F. Mueller n, X. Gu n & A. Iyengar

TL;DR: This study suggests that processing increasing number of client requests spawning fewer number of threads help improve performance, and uncovers scenarios of performance degradation followed by optimized performance via end-point multiplexing. (via Semantic Scholar)
UN Sustainable Development Goal Categories
9. Industry, Innovation and Infrastructure (OpenAlex)
Source: NC State University Libraries
Added: August 6, 2018

Citation Index includes data from a number of different sources. If you have questions about the sources of data in the Citation Index or need a set of data which is free to re-distribute, please contact us.

Certain data included herein are derived from the Web of Science© and InCites© (2025) of Clarivate Analytics. All rights reserved. You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.