TY - GEN
T1 - Fault-Tolerant Scheduling with Dynamic Number of Replicas in Heterogeneous Systems
AU - Zhao, Laiping
AU - Ren, Yizhi
AU - Xiang, Yang
AU - Sakurai, Kouichi
PY - 2010
Y1 - 2010
N2 - In the existing studies on fault-tolerant scheduling, the active replication schema makes use of ε + 1 replicas for each task to tolerate ε failures. However, in this paper, we show that it does not always lead to a higher reliability with more replicas. Besides, the more replicas implies more resource consumption and higher economic cost. To address this problem, with the target to satisfy the user's reliability requirement with minimum resources, this paper proposes a new fault tolerant scheduling algorithm: MaxRe. In the algorithm, we incorporate the reliability analysis into the active replication schema, and exploit a dynamic number of replicas for different tasks. Both the theoretical analysis and experiments prove that the MaxRe algorithm's schedule can certainly satisfy user's reliability requirements. And the MaxRe scheduling algorithm can achieve the corresponding reliability with at most 70% fewer resources than the FTSA algorithm.
AB - In the existing studies on fault-tolerant scheduling, the active replication schema makes use of ε + 1 replicas for each task to tolerate ε failures. However, in this paper, we show that it does not always lead to a higher reliability with more replicas. Besides, the more replicas implies more resource consumption and higher economic cost. To address this problem, with the target to satisfy the user's reliability requirement with minimum resources, this paper proposes a new fault tolerant scheduling algorithm: MaxRe. In the algorithm, we incorporate the reliability analysis into the active replication schema, and exploit a dynamic number of replicas for different tasks. Both the theoretical analysis and experiments prove that the MaxRe algorithm's schedule can certainly satisfy user's reliability requirements. And the MaxRe scheduling algorithm can achieve the corresponding reliability with at most 70% fewer resources than the FTSA algorithm.
UR - http://www.scopus.com/inward/record.url?scp=84863393973&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863393973&partnerID=8YFLogxK
U2 - 10.1109/HPCC.2010.72
DO - 10.1109/HPCC.2010.72
M3 - Conference contribution
AN - SCOPUS:84863393973
SN - 9780769542140
T3 - Proceedings - 2010 12th IEEE International Conference on High Performance Computing and Communications, HPCC 2010
SP - 434
EP - 441
BT - Proceedings - 2010 12th IEEE International Conference on High Performance Computing and Communications, HPCC 2010
T2 - 2010 12th IEEE International Conference on High Performance Computing and Communications, HPCC 2010
Y2 - 1 September 2010 through 3 September 2010
ER -