A review of machine learning for analysing accident reports in the construction industry
Journal article, 2025

Recently, there has been a growth in the research interest on applied machine learning (ML) in safety analysis in the construction industry. The increased interest is part of a search for improved prevention of occupational accidents with a focus on text analysis and natural language processing (NLP). However, ML-based approaches have been less adapted compared to their perceived benefits due to barriers of implementation and challenges in analysing safety records in the construction sector. And the current literature has been criticized for a lack of clarity around the description of methodologies, interpretation, and the context of the application. Therefore, this work aims to review the latest developments in research applying ML to accident report analysis in construction. A review of the published literature on ML-based analysis of construction accident reports was carried out and organized in terms of the data pre-processing, algorithms, testing and implementation and further organized based on data structure. The results of the review found limitation related to data availability besides the manual structuring and the less use of unsupervised learning reflect complexity of handling textual accident data. Moreover, types of accidents happen in proportionally varying frequencies and need careful tackling outside basic assumptions of data pre-processing in addition to the general need for data pre-processing comparative studies and automated pipelines. The review also showed that data mining (DM) and unsupervised learning were less used especially with semi-structured and unstructured datasets reflecting maybe inefficient natural language processing (NLP) application with these types of learning. Among the reviewed articles, only four out of six prototypes were externally validated on construction environment thus we propose that future efforts would benefit from incorporating a standardized development method that also explicit how ML safety recommendation informs decision making. Future research should experiment and ascertain different choices in the pre-processing stage, validating the performance of the ML models and implementation in the construction practices. Finally, there are more advanced NLP methods that could be applied if domain specific repositories were available such as relation extraction and there are various advances that could be explored including large language models (LLMs).

Accident reports

Safety

Machine learning

Natural language processing

Construction industry

Author

May Shayboun

Halmstad University

Dimosthenis Kifokeris

Chalmers, Architecture and Civil Engineering, Building Design

Christian Koch

University of Southern Denmark

Halmstad University

Journal of Information Technology in Construction

18744753 (ISSN)

Vol. 30 439-460

Accident prevention through machine learning at a construction contractor

Development Fund of the Swedish Construction Industry (SBUF) (14159), 2022-10-01 -- 2025-04-01.

Subject Categories (SSIF 2025)

Construction Management

DOI

10.36680/j.itcon.2025.019

More information

Created

4/2/2025 1