Using Microbenchmark Suites to Detect Application Performance Changes
Artikel i vetenskaplig tidskrift, 2023

Software performance changes are costly and often hard to detect pre-release. Similar to software testing frameworks, either application benchmarks or microbenchmarks can be integrated into quality assurance pipelines to detect performance changes before releasing a new application version. Unfortunately, extensive benchmarking studies usually take several hours which is problematic when examining dozens of daily code changes in detail; hence, trade-offs have to be made. Optimized microbenchmark suites, which only include a small subset of the full suite, are a potential solution for this problem, given that they still reliably detect the majority of the application performance changes such as an increased request latency. It is, however, unclear whether microbenchmarks and application benchmarks detect the same performance problems and one can be a proxy for the other. In this paper, we explore whether microbenchmark suites can detect the same application performance changes as an application benchmark. For this, we run extensive benchmark experiments with both the complete and the optimized microbenchmark suites of two time-series database systems, i.e., InfluxDB and VictoriaMetrics, and compare their results to the results of corresponding application benchmarks. We do this for 70 and 110 commits, respectively. Our results show that it is not trivial to detect application performance changes using an optimized microbenchmark suite. The detection (i) is only possible if the optimized microbenchmark suite covers all application-relevant code sections, (ii) is prone to false alarms, and (iii) cannot precisely quantify the impact on application performance. For certain software projects, an optimized microbenchmark suite can, thus, provide fast performance feedback to developers (e.g., as part of a local build process), help estimating the impact of code changes on application performance, and support a detailed analysis while a daily application benchmark detects major performance problems. Thus, although a regular application benchmark cannot be substituted for both studied systems, our results motivate further studies to validate and optimize microbenchmark suites.

Performance Change Detection

Cloud computing

Pipelines

MIMICs

Benchmark testing

Benchmarking

Performance Testing

Codes

Software

Reliability

Regression Detection

Microbenchmarks

Författare

Martin Grambow

Technische Universität Berlin

Denis Kovalev

Technische Universität Berlin

Christoph Laaber

Simula Research Laboratory

Philipp Leitner

Chalmers, Data- och informationsteknik, Interaktionsdesign och Software Engineering

David Bermbach

Technische Universität Berlin

IEEE Transactions on Cloud Computing

21687161 (eISSN)

Vol. 11 3 2575-2590

Ämneskategorier

Datorteknik

Datavetenskap (datalogi)

Datorsystem

DOI

10.1109/TCC.2022.3217947

Mer information

Senast uppdaterat

2023-11-14