TY - JOUR
T1 - Data I/O management approach for the post-hoc visualization of big simulation data results
AU - Nonaka, Jorji
AU - Inacio, Eduardo C.
AU - Ono, Kenji
AU - Dantas, Mario A.R.
AU - Kawashima, Yasuhiro
AU - Kawanabe, Tomohiro
AU - Shoji, Fumiyoshi
N1 - Funding Information:
Some of the results were obtained by using the K computer at the RIKEN Advanced Institute for Computational Science (AICS) in Kobe, Japan. We would like to thank the Brazilian Federal Agency CAPES for supporting some of the authors collaborating in this research work. This work is partially supported by the “Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures” in Japan (Project ID: jh170043, jh170051). We are grateful for the anonymous reviewers for their careful reading and many insightful comments and suggestions for improving this paper.
Publisher Copyright:
© 2018 World Scientific Publishing Company.
PY - 2018/6/1
Y1 - 2018/6/1
N2 - Leading-edge supercomputers, such as the K computer, have generated a vast amount of simulation results, and most of these datasets were stored on the file system for the post-hoc analysis such as visualization. In this work, we first investigated the data generation trends of the K computer by analyzing some operational log data files. We verified a tendency of generating large amounts of distributed files as simulation outputs, and in most cases, the number of files has been proportional to the number of utilized computational nodes, that is, each computational node producing one or more files. Considering that the computational cost of visualization tasks is usually much smaller than that required for large-scale numerical simulations, a flexible data input/output (I/O) management mechanism becomes highly useful for the post-hoc visualization and analysis. In this work, we focused on the xDMlib data management library, and its flexible data I/O mechanism in order to enable flexible data loading of big computational climate simulation results. In the proposed approach, a pre-processing is executed on the target distributed files for generating a light-weight metadata necessary for the elaboration of the data assignment mapping used in the subsequent data loading process. We evaluated the proposed approach by using a 32-node visualization cluster, and the K computer. Besides the inevitable performance penalty associated with longer data loading time, when using smaller number of processes, there is a benefit for avoiding any data replication via copy, conversion, or extraction. In addition, users will be able to freely select any number of nodes, without caring about the number of distributed files, for the post-hoc visualization and analysis purposes.
AB - Leading-edge supercomputers, such as the K computer, have generated a vast amount of simulation results, and most of these datasets were stored on the file system for the post-hoc analysis such as visualization. In this work, we first investigated the data generation trends of the K computer by analyzing some operational log data files. We verified a tendency of generating large amounts of distributed files as simulation outputs, and in most cases, the number of files has been proportional to the number of utilized computational nodes, that is, each computational node producing one or more files. Considering that the computational cost of visualization tasks is usually much smaller than that required for large-scale numerical simulations, a flexible data input/output (I/O) management mechanism becomes highly useful for the post-hoc visualization and analysis. In this work, we focused on the xDMlib data management library, and its flexible data I/O mechanism in order to enable flexible data loading of big computational climate simulation results. In the proposed approach, a pre-processing is executed on the target distributed files for generating a light-weight metadata necessary for the elaboration of the data assignment mapping used in the subsequent data loading process. We evaluated the proposed approach by using a 32-node visualization cluster, and the K computer. Besides the inevitable performance penalty associated with longer data loading time, when using smaller number of processes, there is a benefit for avoiding any data replication via copy, conversion, or extraction. In addition, users will be able to freely select any number of nodes, without caring about the number of distributed files, for the post-hoc visualization and analysis purposes.
UR - http://www.scopus.com/inward/record.url?scp=85045113329&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85045113329&partnerID=8YFLogxK
U2 - 10.1142/S1793962318400068
DO - 10.1142/S1793962318400068
M3 - Article
AN - SCOPUS:85045113329
SN - 1793-9623
VL - 9
JO - International Journal of Modeling, Simulation, and Scientific Computing
JF - International Journal of Modeling, Simulation, and Scientific Computing
IS - 3
M1 - 1840006
ER -