TY - GEN
T1 - Estimating parallel execution time of loops with loop-carried dependences
AU - Nakanishi, T.
AU - Joe, K.
AU - Polychronopoulos, C. D.
AU - Araki, K.
AU - Fukuda, A.
N1 - Funding Information:
*This work is supported under Research Fellowships of the Japan Society for the Promotion of Science for Young Scientists.
Publisher Copyright:
© 1996 IEEE.
PY - 1996
Y1 - 1996
N2 - We propose a scheme to estimate exact minimum parallel execution time of the single loop with loop-carried dependences in medium and fine grain parallel execution. The minimum parallel execution time of a loop is given by the critical path length of the dependence graph which represents the code obtained from the fully unrolled loop. However, unrolling loops with a large number of iterations requires too much computation time and large storage space to be practical. The scheme proposed provides the minimum parallel execution time without unrolling the loop at all by reducing the problem into an integer linear programming problem and employing the simplex method and a branch-and-bound algorithm to solve it. We also show an experimental implementation of the proposed scheme with Livermore Benchmark Kernels to demonstrate that the computational complexity of our scheme is independent of the number of iterations of the given loop.
AB - We propose a scheme to estimate exact minimum parallel execution time of the single loop with loop-carried dependences in medium and fine grain parallel execution. The minimum parallel execution time of a loop is given by the critical path length of the dependence graph which represents the code obtained from the fully unrolled loop. However, unrolling loops with a large number of iterations requires too much computation time and large storage space to be practical. The scheme proposed provides the minimum parallel execution time without unrolling the loop at all by reducing the problem into an integer linear programming problem and employing the simplex method and a branch-and-bound algorithm to solve it. We also show an experimental implementation of the proposed scheme with Livermore Benchmark Kernels to demonstrate that the computational complexity of our scheme is independent of the number of iterations of the given loop.
UR - http://www.scopus.com/inward/record.url?scp=33646740224&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33646740224&partnerID=8YFLogxK
U2 - 10.1109/ICPP.1996.538560
DO - 10.1109/ICPP.1996.538560
M3 - Conference contribution
AN - SCOPUS:33646740224
T3 - Proceedings of the International Conference on Parallel Processing
SP - 61
EP - 69
BT - Software
A2 - Pingali, K.
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 25th International Conference on Parallel Processing, ICPP 1996
Y2 - 12 August 1996 through 16 August 1996
ER -