TY - GEN
T1 - QR Factorization of Block Low-Rank Matrices on Multi-instance GPU
AU - Ohshima, Satoshi
AU - Ida, Akihiro
AU - Yokota, Rio
AU - Yamazaki, Ichitaro
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - The QR factorization, which is a fundamental operation in linear algebra, is used extensively in scientific simulations. The acceleration and memory reduction of it are important research targets. QR factorization using block low-rank matrices (BLR-QR) has previously been proposed to address this issue. In this study, we consider its implementation on a GPU. Current CPUs and GPUs have numerous computational cores and the performance consists of the total performance of them. Therefore, the degree of parallelism of the target calculation is important for obtaining high performance. By contrast, many applications, including BLR-QR, do not have sufficient parallelism. Batched computation has attracted attention for achieving high performance in such calculations. However, the use of it requires major code rewriting and is extremely laborious. Thus, we propose the use of the multi-instance GPU (MIG) feature of current GPUs. Using MIG, we succeeded in obtaining a 53.3% time reduction over the CPU and 77.6% over the GPU without MIG. From the above result, we succeeded in demonstrating rapid implementation of BLR-QR on MIG and usefulness of MIG.
AB - The QR factorization, which is a fundamental operation in linear algebra, is used extensively in scientific simulations. The acceleration and memory reduction of it are important research targets. QR factorization using block low-rank matrices (BLR-QR) has previously been proposed to address this issue. In this study, we consider its implementation on a GPU. Current CPUs and GPUs have numerous computational cores and the performance consists of the total performance of them. Therefore, the degree of parallelism of the target calculation is important for obtaining high performance. By contrast, many applications, including BLR-QR, do not have sufficient parallelism. Batched computation has attracted attention for achieving high performance in such calculations. However, the use of it requires major code rewriting and is extremely laborious. Thus, we propose the use of the multi-instance GPU (MIG) feature of current GPUs. Using MIG, we succeeded in obtaining a 53.3% time reduction over the CPU and 77.6% over the GPU without MIG. From the above result, we succeeded in demonstrating rapid implementation of BLR-QR on MIG and usefulness of MIG.
KW - Low-Rank Matrices
KW - Multi-Instance GPU
KW - QR Factorization
UR - http://www.scopus.com/inward/record.url?scp=85159455070&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85159455070&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-29927-8_28
DO - 10.1007/978-3-031-29927-8_28
M3 - Conference contribution
AN - SCOPUS:85159455070
SN - 9783031299261
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 359
EP - 369
BT - Parallel and Distributed Computing, Applications and Technologies - 23rd International Conference, PDCAT 2022, Proceedings
A2 - Takizawa, Hiroyuki
A2 - Shen, Hong
A2 - Hanawa, Toshihiro
A2 - Hyuk Park, Jong
A2 - Tian, Hui
A2 - Egawa, Ryusuke
PB - Springer Science and Business Media Deutschland GmbH
T2 - 23rd International Conference on Parallel and Distributed Computing, Applications, and Technologies, PDCAT 2022
Y2 - 7 December 2022 through 9 December 2022
ER -