Data-Centric AI for Software Performance Engineering - Predicting Workload Dependent and Independent Performance of Software Systems Using Machine Learning Based Approaches
Licentiatavhandling, 2023
Problem: Developers usually manually understand the performance behaviour with respect to the workload. This process consumes time, effort, and computational resources since the developer repeats the running of the same tested system ( e.g. benchmark) many times, each with different workload values. In a workload-independent setting, predicting the scalar value of execution time based on the structure of the source code is challenging as it is a function of many factors, including the underlying architecture, the input parameters, and the application’s interactions with the operating system. Consequently, works that have attempted to predict absolute execution time for arbitrary applications from source code generally report poor accuracy.
Goal: The thesis presents a modern machine learning-based approach for predicting the execution time from two angles: (a) workload-independent performance. (b) workload-dependent performance.
Solution Approaches and Research Methodologies: To achieve the goal and tackle the problems mentioned earlier, we conducted a systematic empirical study to fill the gap of workload-dependant performance across five well-known projects in JMH benchmarking (including RxJava, Log4J2, and the Eclipse Collections framework) and 126 concrete benchmarks. We generated a dataset of approximately 1.4 million measurements.
As for the poor accuracy challenges, we aim to increase the quality of data which is the source code in this context. To that aim, we invest in Data-Centric AI. Thus, we conduct a systematic literature review, and systematic mapping study about the different approaches of source code representation and the level of information each representation can hold. Then, based on that, we conduct an experimental study to increase the quality of source code representation by establishing a rich hybrid code representation. Then marry this representation with a Graph Neural Network (GNN)- an ML approach to predict the scalar value of the functional test.
Results: Our results showed that by investing in classical ML approaches, we could predict the performance value of the benchmarks according to configuration workload. Moreover, with our proposed method, the developers can easily determine the impact of each workload on the performance measurement. On the other hand, by increasing the data quality through data-centric AI, we achieve very high and considerable accuracy in predicting the absolute execution time of software performance only according to the structure of the source code.
Machine Learning
Deep Learning
Software Performance Prediction
Graph Neural Network
Representation Learning
Data-Centric AI
Source code Representation
Författare
Peter Samoaa
Chalmers, Data- och informationsteknik, Data Science och AI
Hazem Samoaa
Chalmers, Data- och informationsteknik, Interaktionsdesign och Software Engineering
TEP-GNN: Accurate Execution Time Prediction of Functional Tests Using Graph Neural Networks
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),;Vol. 13709 LNCS(2022)p. 464-479
Paper i proceeding
A systematic mapping study of source code representation for deep learning in software engineering
IET Software,;Vol. 16(2022)p. 351-385
Reviewartikel
A Pipeline for Measuring Brand Loyalty Through Social Media Mining
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),;Vol. 12607 LNCS(2021)p. 489-504
Paper i proceeding
An Exploratory Study of the Impact of Parameterization on JMH Measurement Results in Open-Source Projects
ICPE 2021 - Proceedings of the ACM/SPEC International Conference on Performance Engineering,;(2021)p. 213-224
Paper i proceeding
Ämneskategorier
Data- och informationsvetenskap
Programvaruteknik
Datorsystem
Utgivare
Chalmers