TY - GEN
T1 - Low byte/flop implementation of iterative solver for sparse matrices derived from stencil computations
AU - Ono, Kenji
AU - Chiba, Shuichi
AU - Inoue, Shunsuke
AU - Minami, Kazuo
N1 - Funding Information:
We thank the RIKEN Advanced Institute for Computational Science for allowing us to use the K computer to obtain our results. Part of this research was supported by a grant for the “Strategic Program on HPCI Field No. 4: Industrial Innovations” from the Ministry of Education, Culture, Sports, Science, and Technology (MEXT) “Development and Use of Advanced, High-Performance, General-Purpose Supercomputers Project,” and was carried out in partnership with the University of Tokyo.
Publisher Copyright:
© Springer International Publishing Switzerland 2015.
PY - 2015
Y1 - 2015
N2 - Practical simulators require high-performance iterative methods and efficient boundary conditions, especially in the field of computational fluid dynamics. In this paper, we propose a novel bitrepresentation technique to enhance the performance of such simulators. The technique is applied to an iterative kernel implementation that treats various boundary conditions in a stencil computation on a structured grid system. This approach reduces traffic from the main memory to CPU, and effectively utilizes Single Instruction–Multiple Data (SIMD) stream units with cache because of the bit-representation and compression of matrix elements. The proposed implementation also replaces if-branch statements with mask operations using the bit expression. This promotes the optimization of code during compilation and runtime. To evaluate the performance of the proposed implementation, we employ the Red–Black SOR and BiCGstab algorithms. Experimental results show that the proposed approach is up to 3.5 times faster than a naïve implementation on both the Intel and Fujitsu Sparc architectures.
AB - Practical simulators require high-performance iterative methods and efficient boundary conditions, especially in the field of computational fluid dynamics. In this paper, we propose a novel bitrepresentation technique to enhance the performance of such simulators. The technique is applied to an iterative kernel implementation that treats various boundary conditions in a stencil computation on a structured grid system. This approach reduces traffic from the main memory to CPU, and effectively utilizes Single Instruction–Multiple Data (SIMD) stream units with cache because of the bit-representation and compression of matrix elements. The proposed implementation also replaces if-branch statements with mask operations using the bit expression. This promotes the optimization of code during compilation and runtime. To evaluate the performance of the proposed implementation, we employ the Red–Black SOR and BiCGstab algorithms. Experimental results show that the proposed approach is up to 3.5 times faster than a naïve implementation on both the Intel and Fujitsu Sparc architectures.
UR - http://www.scopus.com/inward/record.url?scp=84942627076&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84942627076&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-17353-5_17
DO - 10.1007/978-3-319-17353-5_17
M3 - Conference contribution
AN - SCOPUS:84942627076
SN - 9783319173528
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 192
EP - 205
BT - High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Revised Selected Papers
A2 - Marques, Osni
A2 - Dayde, Michel
A2 - Nakajima, Kengo
PB - Springer Verlag
T2 - 11th International Conference on High Performance Computing for Computational Science, VECPAR 2014
Y2 - 30 June 2014 through 3 July 2014
ER -