Clustering techniques and keyword extraction with large language models for knowledge discovery in building defects data
Journal article, 2025

PurposeThe construction industry is undergoing a digital transformation and now holds large volumes of digital building defects data collected during inspections. This study aims to suggest an artificial intelligence-based method for analysing such building defects data to provide insights and knowledge faster than with traditional manual methods.Design/methodology/approachThis research explores a data set containing over 34,000 defects from hospital projects performed in Sweden from 2018 to 2021. The data mining uses keyword extraction based on both TF-IDF vectorisation and k-means clustering, the Mistral 7B model and KeyLLM. The results are compared with a content analysis using the GPT 3.5 turbo model. The analysis is performed both on an organisational and project level.FindingsThe paper presents a combination of methods for analysing building defects data. The result shows that the most common problems reported during the inspections concern missing fire sealing, jointing and subceiling problems. Using k-means clustering gives fast insights into the main defect categories of the data set but requires domain knowledge. Keyword extraction using an LLM requires longer computational time but creates a deeper understanding of subcategories of defects. Finally, GPT-based content analysis is a complement to provide project-specific insights and allow user-specific requests.Research limitations/implicationsThe study is performed using data digitally collected in Swedish hospital projects. However, the results and methodology can be applied on other project data, such as safety inspections and warranty data. The analysis focused solely on text data.Originality/valueThe method suggested in this paper uses clustering techniques and Large Language Models for analysing building defect data. The value of the proposed method is a faster process for leveraging knowledge from large amounts of unstructured text data, such as building defect reports, safety and moisture inspections and warranty issues.

Defects

Inspections

Knowledge generation

LLM

Author

Linda Cusumano

Chalmers, Architecture and Civil Engineering, Construction Management

Nilla Olsson

NCC AB

Mats Granath

University of Gothenburg

Robert Jockwer

Chalmers, Architecture and Civil Engineering, Structural Engineering

Rasmus Rempling

Chalmers, Architecture and Civil Engineering, Construction Management

Construction Innovation

1471-4175 (ISSN) 1477-0857 (eISSN)

Vol. 25 7 76-97

Subject Categories (SSIF 2025)

Construction Management

Computer Sciences

DOI

10.1108/CI-04-2024-0123

More information

Latest update

4/4/2025 8