TY - GEN
T1 - Power-capped DVFS and thread allocation with ANN models on modern NUMA systems
AU - Imamura, Satoshi
AU - Sasaki, Hiroshi
AU - Inoue, Koji
AU - Nikolopoulos, Dimitrios S.
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/12/3
Y1 - 2014/12/3
N2 - Power capping is an essential function for efficient power budgeting and cost management on modern server systems. Contemporary server processors operate under power caps by using dynamic voltage and frequency scaling (DVFS). However, these processors are often deployed in non-uniform memory access (NUMA) architectures, where thread allocation between cores may significantly affect performance and power consumption. This paper proposes a method which maximizes performance under power caps on NUMA systems by dynamically optimizing two knobs: DVFS and thread allocation. The method selects the optimal combination of the two knobs with models based on artificial neural network (ANN) that captures the nonlinear effect of thread allocation on performance. We implement the proposed method as a runtime system and evaluate it with twelve multithreaded benchmarks on a real AMD Opteron based NUMA system. The evaluation results show that our method outperforms a naive technique optimizing only DVFS by up to 67.1%, under a power cap.
AB - Power capping is an essential function for efficient power budgeting and cost management on modern server systems. Contemporary server processors operate under power caps by using dynamic voltage and frequency scaling (DVFS). However, these processors are often deployed in non-uniform memory access (NUMA) architectures, where thread allocation between cores may significantly affect performance and power consumption. This paper proposes a method which maximizes performance under power caps on NUMA systems by dynamically optimizing two knobs: DVFS and thread allocation. The method selects the optimal combination of the two knobs with models based on artificial neural network (ANN) that captures the nonlinear effect of thread allocation on performance. We implement the proposed method as a runtime system and evaluate it with twelve multithreaded benchmarks on a real AMD Opteron based NUMA system. The evaluation results show that our method outperforms a naive technique optimizing only DVFS by up to 67.1%, under a power cap.
UR - http://www.scopus.com/inward/record.url?scp=84919626863&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84919626863&partnerID=8YFLogxK
U2 - 10.1109/ICCD.2014.6974701
DO - 10.1109/ICCD.2014.6974701
M3 - Conference contribution
AN - SCOPUS:84919626863
T3 - 2014 32nd IEEE International Conference on Computer Design, ICCD 2014
SP - 324
EP - 331
BT - 2014 32nd IEEE International Conference on Computer Design, ICCD 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 32nd IEEE International Conference on Computer Design, ICCD 2014
Y2 - 19 October 2014 through 22 October 2014
ER -