TY - GEN
T1 - Scalable Direct-Iterative Hybrid Solver for Sparse Matrices on Multi-Core and Vector Architectures
AU - Ono, Kenji
AU - Kato, Toshihiro
AU - Ohshima, Satoshi
AU - Nanri, Takeshi
N1 - Funding Information:
The present study was supported in part by MEXT as a social and scientific priority issue (Development of Innovative Design and Production Processes that Lead the Way for the Manufacturing Industry in the Near Future) to be tackled using the post-K computer (No. hp190197). computation was carried out using the computer resources offered under the category of General Project by Research Institute for Information Technology, Kyushu University.
Publisher Copyright:
© 2020 ACM.
PY - 2020/1/15
Y1 - 2020/1/15
N2 - In the present paper, we propose an efficient direct-iterative hybrid solver for sparse matrices that can derive the scalability of the latest multi-core, many-core, and vector architectures and examine the execution performance of the proposed SLOR-PCR method. We also present an efficient implementation of the PCR algorithm for SIMD and vector architectures so that it is easy to output instructions optimized by the compiler. The proposed hybrid method has high cache reusability, which is favorable for modern low B/F architecture because efficient use of the cache can mitigate the memory bandwidth limitation. The measured performance revealed that the SLOR-PCR solver showed excellent scalability up to 352 cores on the cc-NUMA environment, and the achieved performance was higher than that of the conventional Jacobi and Red-Black ordering method by a factor of 3.6 to 8.3 on the SIMD architecture. In addition, the maximum speedup in computation time was observed to be a factor of 6.3 on the cc-NUMA architecture with 352 cores.
AB - In the present paper, we propose an efficient direct-iterative hybrid solver for sparse matrices that can derive the scalability of the latest multi-core, many-core, and vector architectures and examine the execution performance of the proposed SLOR-PCR method. We also present an efficient implementation of the PCR algorithm for SIMD and vector architectures so that it is easy to output instructions optimized by the compiler. The proposed hybrid method has high cache reusability, which is favorable for modern low B/F architecture because efficient use of the cache can mitigate the memory bandwidth limitation. The measured performance revealed that the SLOR-PCR solver showed excellent scalability up to 352 cores on the cc-NUMA environment, and the achieved performance was higher than that of the conventional Jacobi and Red-Black ordering method by a factor of 3.6 to 8.3 on the SIMD architecture. In addition, the maximum speedup in computation time was observed to be a factor of 6.3 on the cc-NUMA architecture with 352 cores.
UR - http://www.scopus.com/inward/record.url?scp=85094840845&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85094840845&partnerID=8YFLogxK
U2 - 10.1145/3368474.3368484
DO - 10.1145/3368474.3368484
M3 - Conference contribution
AN - SCOPUS:85094840845
T3 - ACM International Conference Proceeding Series
SP - 11
EP - 21
BT - Proceedings of International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2020
PB - Association for Computing Machinery
T2 - 2020 International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2020
Y2 - 15 January 2020 through 17 January 2020
ER -