TY - GEN
T1 - Fast, Light-weight, and Accurate Performance Evaluation using Representative Datacenter Behaviors
AU - Lee, Jaewon
AU - Min, Dongmoon
AU - Byun, Ilkwon
AU - Jang, Hanhwi
AU - Kim, Jangwoo
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/11/27
Y1 - 2023/11/27
N2 - Datacenters rapidly evolve by adopting new features such as new hardware deployment and software patches. Adopting a new feature requires an accurate evaluation of its impact to minimize the risk to the multi-million dollar computing infrastructure. However, a comprehensive performance analysis of a datacenter is extremely challenging due to its cost and multitenancy. Evaluating the performance in a live datacenter is accurate but prohibitive to prevent any damage to production services. Using conventional load-testing benchmarks on small-scale testbeds is imprecise as they do not consider the effect of other co-located jobs. In this paper, we propose FLARE, a fast, lightweight, and accurate performance evaluation method using representative datacenter behaviors. The key idea is to extract a small set of representative job colocation scenarios from all possible job colocations in a target datacenter. FLARE systematically characterizes and groups job colocations according to performance and resource metrics, providing high-level insights into the datacenter's behaviors. Then, it reconstructs the colocations on a testbed and allows accurate feature evaluation with load-testing benchmarks. We evaluate FLARE using an in-house datacenter and three features: cache sizing, DVFS, and SMT configurations. FLARE accurately estimates the impact of features with less than 1% errors by incurring 50× and 10× lower evaluation costs compared to full datacenter and sampling-based evaluation, respectively.
AB - Datacenters rapidly evolve by adopting new features such as new hardware deployment and software patches. Adopting a new feature requires an accurate evaluation of its impact to minimize the risk to the multi-million dollar computing infrastructure. However, a comprehensive performance analysis of a datacenter is extremely challenging due to its cost and multitenancy. Evaluating the performance in a live datacenter is accurate but prohibitive to prevent any damage to production services. Using conventional load-testing benchmarks on small-scale testbeds is imprecise as they do not consider the effect of other co-located jobs. In this paper, we propose FLARE, a fast, lightweight, and accurate performance evaluation method using representative datacenter behaviors. The key idea is to extract a small set of representative job colocation scenarios from all possible job colocations in a target datacenter. FLARE systematically characterizes and groups job colocations according to performance and resource metrics, providing high-level insights into the datacenter's behaviors. Then, it reconstructs the colocations on a testbed and allows accurate feature evaluation with load-testing benchmarks. We evaluate FLARE using an in-house datacenter and three features: cache sizing, DVFS, and SMT configurations. FLARE accurately estimates the impact of features with less than 1% errors by incurring 50× and 10× lower evaluation costs compared to full datacenter and sampling-based evaluation, respectively.
KW - datacenters
KW - performance modeling
KW - sampling-based evaluation
UR - http://www.scopus.com/inward/record.url?scp=85179893634&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85179893634&partnerID=8YFLogxK
U2 - 10.1145/3590140.3629117
DO - 10.1145/3590140.3629117
M3 - Conference contribution
AN - SCOPUS:85179893634
T3 - Middleware 2023 - Proceedings of the 24th ACM/IFIP International Middleware Conference
SP - 220
EP - 233
BT - Middleware 2023 - Proceedings of the 24th ACM/IFIP International Middleware Conference
PB - Association for Computing Machinery, Inc
T2 - 24th ACM/IFIP International Middleware Conference, Middleware 2023
Y2 - 11 December 2023 through 15 December 2023
ER -