TY - GEN
T1 - Performance balancing
T2 - 10th MEDEA Workshop on MEmory Performance: DEaling with Applications, Systems and Architecture, MEDEA '09, held in conjunction with the Int. Conf. on Parallel Architectures and Compilation Techniques, PACT 2009
AU - Fukumoto, Naoto
AU - Imazato, Kenichi
AU - Inoue, Koji
AU - Murakami, Kazuaki
PY - 2009
Y1 - 2009
N2 - This paper proposes the concept of performance balancing, and reports its performance impact on a Chip multiprocessor (CMP). Integrating multiple processor cores into a single chip, or CMPs, can achieve higher peak performance by means of exploiting thread level parallelism. However, the off-chip memory bandwidth which does not scale with the number of cores tends to limit the potential of CMPs. To solve this issue, the technique proposed in this paper attempts to make a good balance between computation and memorization. Unlike conventional parallel executions, this approach exploits some cores to improve the memory performance. These cores devote the on-chip memory hardware resources to the remaining cores executing the parallelized threads. In our evaluation, it is observed that our approach can achieve 31% of performance improvement compared to a conventional parallel execution model in the specified program.
AB - This paper proposes the concept of performance balancing, and reports its performance impact on a Chip multiprocessor (CMP). Integrating multiple processor cores into a single chip, or CMPs, can achieve higher peak performance by means of exploiting thread level parallelism. However, the off-chip memory bandwidth which does not scale with the number of cores tends to limit the potential of CMPs. To solve this issue, the technique proposed in this paper attempts to make a good balance between computation and memorization. Unlike conventional parallel executions, this approach exploits some cores to improve the memory performance. These cores devote the on-chip memory hardware resources to the remaining cores executing the parallelized threads. In our evaluation, it is observed that our approach can achieve 31% of performance improvement compared to a conventional parallel execution model in the specified program.
UR - http://www.scopus.com/inward/record.url?scp=74549130566&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=74549130566&partnerID=8YFLogxK
U2 - 10.1145/1621960.1621966
DO - 10.1145/1621960.1621966
M3 - Conference contribution
AN - SCOPUS:74549130566
SN - 9781605588308
T3 - ACM International Conference Proceeding Series
SP - 28
EP - 34
BT - Proceedings of the 10th MEDEA Workshop on MEmory Performance
Y2 - 13 September 2009 through 13 September 2009
ER -