Using application benchmark call graphs to quantify and improve the practical relevance of microbenchmark suites
Journal article, 2021

Performance problems in applications should ideally be detected as soon as they occur, i.e., directly when the causing code modification is added to the code repository. To this end, complex and cost-intensive application benchmarks or lightweight but less relevant microbenchmarks can be added to existing build pipelines to ensure performance goals. In this paper, we show how the practical relevance of microbenchmark suites can be improved and verified based on the application flow during an application benchmark run. We propose an approach to determine the overlap of common function calls between application and microbenchmarks, describe a method which identifies redundant microbenchmarks, and present a recommendation algorithm which reveals relevant functions that are not covered by microbenchmarks yet. A microbenchmark suite optimized in this way can easily test all functions determined to be relevant by application benchmarks after every code change, thus, significantly reducing the risk of undetected performance problems. Our evaluation using two time series databases shows that, depending on the specific application scenario, application benchmarks cover different functions of the system under test. Their respective microbenchmark suites cover between 35.62% and 66.29% of the functions called during the application benchmark, offering substantial room for improvement. Through two use cases-removing redundancies in the microbenchmark suite and recommendation of yet uncovered functions-we decrease the total number of microbenchmarks and increase the practical relevance of both suites. Removing redundancies can significantly reduce the number of microbenchmarks (and thus the execution time as well) to similar to 10% and similar to 23% of the original microbenchmark suites, whereas recommendation identifies up to 26 and 14 newly, uncovered functions to benchmark to improve the relevance. By utilizing the differences and synergies of application benchmarks and microbenchmarks, our approach potentially enables effective software performance assurance with performance tests of multiple granularities.

Microbenchmarking

Time series databases

Performance testing

Benchmarking

Author

Martin Grambow

Einstein Center Digital Future

Technische Universität Berlin

Christoph Laaber

University of Zürich

Philipp Leitner

Chalmers, Computer Science and Engineering (Chalmers), Software Engineering (Chalmers)

David Bermbach

Einstein Center Digital Future

Technische Universität Berlin

PeerJ Computer Science

23765992 (eISSN)

Vol. 7 e548

ImmeRSEd - Developer-Targeted Performance Engineering for Immersed Release and Software Engineers

Swedish Research Council (VR) (2018-04127), 2019-01-01 -- 2023-12-31.

Subject Categories (SSIF 2011)

Computer Engineering

Embedded Systems

Computer Systems

DOI

10.7717/peerj-cs.548

PubMed

34141882

More information

Latest update

3/21/2023