Optimization of hierarchical matrix computation on GPU

Satoshi Ohshima, Ichitaro Yamazaki, Akihiro Ida, Rio Yokota

研究成果: 書籍/レポート タイプへの寄稿会議への寄与

5 被引用数 (Scopus)

抄録

The demand for dense matrix computation in large scale and complex simulations is increasing; however, the memory capacity of current computer system is insufficient for such simulations. Hierarchical matrix method (H -matrices) is attracting attention as a computational method that can reduce the memory requirements of dense matrix computations. However, the computation of H -matrices is more complex than that of dense and sparse matrices; thus, accelerating the H -matrices is required. We focus on H -matrix - vector multiplication (HMVM) on a single NVIDIA Tesla P100 GPU. We implement five GPU kernels and compare execution times among various processors (the Broadwell-EP, Skylake-SP, and Knights Landing) by OpenMP. The results show that, although an HMVM kernel can compute many small GEMV kernels, merging such kernels to a single GPU kernel was the most effective implementation. Moreover, the performance of BATCHED BLAS in the MAGMA library was comparable to that of the manually tuned GPU kernel.

本文言語英語
ホスト出版物のタイトルSupercomputing Frontiers - 4th Asian Conference, SCFA 2018, Proceedings
編集者Rio Yokota, Weigang Wu
出版社Springer Verlag
ページ274-292
ページ数19
ISBN(印刷版)9783319699523
DOI
出版ステータス出版済み - 2018
イベント4th Asian Conference on Supercomputing Frontiers, SCFA 2018 - Singapore, シンガポール
継続期間: 3月 26 20183月 29 2018

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
10776 LNCS
ISSN(印刷版)0302-9743
ISSN(電子版)1611-3349

会議

会議4th Asian Conference on Supercomputing Frontiers, SCFA 2018
国/地域シンガポール
CitySingapore
Period3/26/183/29/18

!!!All Science Journal Classification (ASJC) codes

  • 理論的コンピュータサイエンス
  • コンピュータサイエンス一般

フィンガープリント

「Optimization of hierarchical matrix computation on GPU」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル