Coppola M., Bertolli C., Zoccolo C. The co-replication methodology and its application to structured parallel programs. In: HPC-GECO/CompFrame 2007 - HPC-GECO/CompFrame joint Workshop. Symposium on Component and framework technology in high-performance and scientific computing (Montreal, Canada, 21-22/10 2007). Proceedings, pp. 39 - 48. David E. Bernholdt, Vladimir S. Getov (eds.). ACM Press, 2007.
We introduce Co-Replication, a technique exploiting abstract properties of a computation to allow parallel replicas of a software module to cooperate, enhancing both the reliability and availability of the resulting component, and providing a flexible trade-off among the two properties. In Co-Replication a complete partial ordering is defined on the computation state. The formal expression of the state combination operation among replicas allows them to compute independently as a co-algorithm, and to exploit low-overhead, opportunistic strategies for spreading results and surviving to faults. Co-Replication suits structured parallel and component based programming, as it needs a high level description of the computation properties, and thus can ease exploitation of non fault-free, parallel platforms like large clusters and Grids. We describe the theoretical foundations of Co-Replication, and investigate the use of random gossiping strategies for the state combination. To show the applicability of the technique, we discuss the modelization of Master- Slave and task farm computations, and report test results over two applications.
Subject Co-algorithm
Parallel programming
High-level parallel programming
Fault tolerance
Fault recovery
D.3.3 Language Constructs and Features
D.1.3 Concurrent Programming
D.4.5 Operating Systems. Reliability

