Size or diversity? Synthetic dataset recommendations for machine learning heating energy prediction models in early design stages for residential buildings
Journal article, 2025

One promising means to reduce building energy for a more sustainable environment is to conduct early-stage building energy optimization using simulation, yet today’s simulation engines are computationally intensive. Recently, machine learning (ML) energy prediction models have shown promise in replacing these simulation engines. However, it is often difficult to develop such ML models due to the lack of proper datasets. Synthetic datasets can provide a solution, but determining the optimal quantity and diversity of synthetic data remains a challenging task. Furthermore, there is a lack of understanding of the compatibility between different ML algorithms and the characteristics of synthetic datasets. To fill these gaps, this study conducted multiple ML experiments using residential buildings in Sweden to determine the best-performing ML algorithm, as well as the characteristics of the corresponding synthetic dataset. A parametric model was developed to generate a wide range of synthetic datasets varying in size and building shape, referred to as diversity. Five ML algorithms selected through a literature review were trained using the different datasets. Results show that the Support Vector Machine performed the best overall. Multiple Linear Regression performed well with small and low-diverse datasets, while the Artificial Neural Network performed well with large and high-diverse datasets. We conclude that developers should focus more on increasing diversity instead of size once the dataset size reaches around 1440 when generating synthetic training datasets. This study offers insights for researchers and practitioners, such as software tool developers, when developing ML building energy prediction models in early-stage optimization.

building energy

synthetic data

early stage optimization

data diversity

training size

machine learning

Author

Xinyue Wang

Chalmers, Architecture and Civil Engineering, Building Technology

Yinan Yu

Chalmers, Computer Science and Engineering (Chalmers), Functional Programming

Robin Teigland

Chalmers, Technology Management and Economics, Entrepreneurship and Strategy

Alexander Hollberg

Chalmers, Architecture and Civil Engineering, Building Technology

Energy and AI

26665468 (eISSN)

Vol. 21 100557

Stakeholder-specific environmental and economic optimization of buildings in early design stages

Formas (2020-00934), 2021-01-01 -- 2024-12-31.

Subject Categories (SSIF 2025)

Software Engineering

Driving Forces

Sustainable development

DOI

10.1016/j.egyai.2025.100557

More information

Latest update

8/11/2025