TY - GEN
T1 - Performance prediction of large-scale parallell system and application using macro-level simulation
AU - Susukita, Ryutaro
AU - Kimura, Yasunori
AU - Ando, Hisashige
AU - Komatsu, Hidemi
AU - Aoyagi, Mutsumi
AU - Kurokawa, Motoyoshi
AU - Honda, Hiroaki
AU - Murakami, Kazuaki J.
AU - Inadomi, Yuichi
AU - Shibamura, Hidetomo
AU - Inoue, Koji
AU - Yamamura, Shuji
AU - Ishizuki, Shigeru
AU - Yu, Yunqing
PY - 2008
Y1 - 2008
N2 - To predict application performance on an HPC system is an important technology for designing the computing system and developing applications. However, accurate prediction is a challenge, particularly, in the case of a future coming system with higher performance. In this paper, we present a new method for predicting application performance on HPC systems. This method combines modeling of sequential performance on a single processor and macro-level simulations of applications for parallel performance on the entire system. In the simulation, the execution flow is traced but kernel computations are omitted for reducing the execution time. Validation on a real terascale system showed that the predicted and measured performance agreed within 10% to 20 %. We employed the method in designing a hypothetical petascale system of 32768 SIMD-extended processor cores. For predicting application performance on the petascale system, the macro-level simulation required several hours.
AB - To predict application performance on an HPC system is an important technology for designing the computing system and developing applications. However, accurate prediction is a challenge, particularly, in the case of a future coming system with higher performance. In this paper, we present a new method for predicting application performance on HPC systems. This method combines modeling of sequential performance on a single processor and macro-level simulations of applications for parallel performance on the entire system. In the simulation, the execution flow is traced but kernel computations are omitted for reducing the execution time. Validation on a real terascale system showed that the predicted and measured performance agreed within 10% to 20 %. We employed the method in designing a hypothetical petascale system of 32768 SIMD-extended processor cores. For predicting application performance on the petascale system, the macro-level simulation required several hours.
UR - http://www.scopus.com/inward/record.url?scp=70350780320&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70350780320&partnerID=8YFLogxK
U2 - 10.1109/SC.2008.5220091
DO - 10.1109/SC.2008.5220091
M3 - Conference contribution
AN - SCOPUS:70350780320
SN - 9781424428359
T3 - 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008
BT - 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008
T2 - 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008
Y2 - 15 November 2008 through 21 November 2008
ER -