Istituto di Scienza e Tecnologie dell'Informazione     
Aglietti F., Centurioni E., Chessa S., D'Auria I., Franzinelli F., Maestrini P., Michelotti A., Pagliai I., Tripiccione R. Self-diagnosis of Apemille. In: EDCC - Proceedings of Companion Workshop on Dependable Computing (Gliwice, Poland, 1997). Proceedings, vol. 1 pp. 73 - 84. IEEE, 1996.
The APEmille, the third evolution of the APEfamily ofSIMD machines, is structured as a three-dimensional array of processors. In its largest configuration, the number of processors is 4096. Ics typical application range coversmassive comptuations (e.g., those neededto solve some problems in phisics research), which may requireas manyas 1017floating point operaiions. Given the long rimeneeded lo complete sucb. jobs, the machine shouldbe able to toleraie the occurrence of multiple jaults during che job execution. To this purpose, self-diagnosis capabilities have been incorporatedin its design, using an approach inspired by a family of algorithms recently introduced to perform the system-level diagnosis of regular architectures. Themachineispartitioned into three subsystems, each structuredas a threedimensionai array, which are diagnosed separately using s/ightly dlfferen: variants of the same diagnosis algorithm. The system units are tested by means of comparisons, either concurrently with che job execusion or during special diagnosis sessions. The strategy io test the units and the diagnosis algorithms are described, and the diagnosis correctess and completeness are evaluated both theoretically and experimentaliy.
Subject Fault-tolerance
System-level diagnosis
SIMD machines
Grid interconnection

Icona documento 1) Download Document PDF

Icona documento Open access Icona documento Restricted Icona documento Private


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional