Istituto di Scienza e Tecnologie dell'Informazione     
Bondavalli A., Chiaradonna S., Di Giandomenico F., Strigini L. Rational design of Multiple-Redundant systems : adjudication and fault treatment. In: Predictably Dependable Computing Systems / edited by B. Randell ... [et al.]. (Basic research series), pp. 141 - 237. Springer, 1995.
The design of fault-tolerant systems should ideally be based on rigorous predictions of the effects of design decisions on the achieveddependability. However, the complexity of the task is such that these decisions are typically based on ingrained, time-proven practice, without the benefit of thorough mathematical analysis. We analyse two specific problems in fault-tolerant design based on modular replication (with or without design diversity). First, we consider derivation of a single correct result from the multiple results produced the replicas in a redundant component. Many designs have been proposed in the literature. supposed to improve upon simple majority voting. but without a unified, rigorous analysis to assist design choices. We describe such a general method for evaluating and comparing adjudicators, in probabilistic terms, and specify an optimal adjudicator, which yields the highest possible rei iabi li ty for a redundant component, given the (probabilistic) failure characteristics of its subcomponcnts. Our analysis applies to components with and without a fai l-safc mode. Second, we consider fault treatment: how the decision can be made to remove a replica of a component, considering it permanently failed, on the basis of its history of agreement/disagreement with other replicas. The problem is compounded by transient faults, which make it undesirable to disconnect a component at the first signs of errors, and by the use of dynamic error processing, in which the number of replicas executed depends on whether disagreements are observed. For this problem, we choose a scheme integrating dynamic error processing with diagnosis and disconnection of components that may be permanently failed, and show how its behaviour can be compared with alternative designs via simulation.
Subject Fault treatment
C.4 Performance of systems. Fault tolerance

Icona documento 1) Download Document PDF

Icona documento Open access Icona documento Restricted Icona documento Private


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional