Machine learning for analysis of occupational accidents registration data
Paper in proceedings, 2020
Regardless of the efforts of employers and public organizations to eliminate occupational accidents, the latter is a persistent problem in the construction industry. In the Swedish construction context, there is a desire to identify causes and factors playing a role in work-related accident prevention, as there are large underused databases of collected registrations that represent knowledge on causes and the context of accidents. The aim of the current contribution is to review the application of machine learning (ML) in the improved prevention of accidents and corresponding injuries, and to identify current limitations - and most importantly to answer the question of whether ML actually reveals more than what is currently known about accidents in construction. A systematic literature review on the use of ML for analysing data of accident records was carried out. In the reviewed literature, ML was applied in the prediction of accidents or their outcome, and the extraction or identification of the causes affecting the risks of injuries. ML combined with data mining (DM) techniques such as Natural Language Processing and graph mining, appears to be beneficial in discovering associations between different features and in multiple levels of clusters. However, the literature shows that research on ML in accident prevention is at an early stage. The review of the literature indicates gaps in the justification of methodological choices, such as the choice of ML method and data processing. Moreover, characteristics of the injury rates and severity are shown to be clashing with the mechanisms of the ML classification algorithms. This should probably lead to abandoning severity as a parameter and changing the approach towards the asymmetric data classes (denoted "unbalanced" in ML methodology), leaving space for finding the important causes. An overreliance on internal validity testing and lack of external testing of the algorithms’ performance and prediction accuracy persists. Future research needs to focus on methods addressing the problem of data processing, explaining the choice of methods, explaining the results (especially the variance in ML algorithm’s performance), merging different data sources, considering more attributes (such as risk management), applying deep learning algorithms, and improving the testing accuracy of ML models.
occupational accident prevention