Optimization of hierarchical matrix computation on GPU

Satoshi Ohshima, Ichitaro Yamazaki, Akihiro Ida, Rio Yokota

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    5 Citations (Scopus)


    The demand for dense matrix computation in large scale and complex simulations is increasing; however, the memory capacity of current computer system is insufficient for such simulations. Hierarchical matrix method (H -matrices) is attracting attention as a computational method that can reduce the memory requirements of dense matrix computations. However, the computation of H -matrices is more complex than that of dense and sparse matrices; thus, accelerating the H -matrices is required. We focus on H -matrix - vector multiplication (HMVM) on a single NVIDIA Tesla P100 GPU. We implement five GPU kernels and compare execution times among various processors (the Broadwell-EP, Skylake-SP, and Knights Landing) by OpenMP. The results show that, although an HMVM kernel can compute many small GEMV kernels, merging such kernels to a single GPU kernel was the most effective implementation. Moreover, the performance of BATCHED BLAS in the MAGMA library was comparable to that of the manually tuned GPU kernel.

    Original languageEnglish
    Title of host publicationSupercomputing Frontiers - 4th Asian Conference, SCFA 2018, Proceedings
    EditorsRio Yokota, Weigang Wu
    PublisherSpringer Verlag
    Number of pages19
    ISBN (Print)9783319699523
    Publication statusPublished - 2018
    Event4th Asian Conference on Supercomputing Frontiers, SCFA 2018 - Singapore, Singapore
    Duration: Mar 26 2018Mar 29 2018

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume10776 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349


    Conference4th Asian Conference on Supercomputing Frontiers, SCFA 2018

    All Science Journal Classification (ASJC) codes

    • Theoretical Computer Science
    • Computer Science(all)


    Dive into the research topics of 'Optimization of hierarchical matrix computation on GPU'. Together they form a unique fingerprint.

    Cite this