2023 journal article

Software Fault Tolerance in Real-Time Systems: Identifying the Future Research Questions

ACM Computing Surveys.

TL;DR: A joint scheduling-failure analysis model is proposed that highlights the formal interactions among software fault tolerance mechanisms and timing properties and allows for many open research questions to be presented and discussed with the final aim to spur future research activities. (via Semantic Scholar)
Source: Crossref
Added: June 26, 2023

Tolerating hardware faults in modern architectures is becoming a prominent problem due to the miniaturization of the hardware components, their increasing complexity, and the necessity to reduce costs. Software-Implemented Hardware Fault Tolerance approaches have been developed to improve system dependability regarding hardware faults without resorting to custom hardware solutions. However, these come at the expense of making the satisfaction of the timing constraints of the applications/activities harder from a scheduling standpoint. This article surveys the current state-of-the-art of fault tolerance approaches when used in the context of real-time systems, identifying the main challenges and the cross-links between these two topics. We propose a joint scheduling-failure analysis model that highlights the formal interactions among software fault tolerance mechanisms and timing properties. This model allows us to present and discuss many open research questions with the final aim to spur future research activities.