Quantifying variable contributions to bus operation delays considering causal relationships
                
                        Journal article, 2025
                
            
                    
                        Bus services often face operational delays due to dynamic conditions such as traffic congestion, which can propagate through bus routes, affecting overall system performance. Understanding the causes of bus arrival delays is crucial for effective public transport management and control. Moreover, understanding the contribution of each factor to bus delays not only aids in developing targeted strategies to mitigate delays but is also crucial for effective decision-making and planning. Traditional research primarily focuses on correlation-based analysis, lacking the ability to reveal the underlying causal mechanisms. Additionally, no studies have considered the complex causal relationships between factors when quantifying their contributions to outcomes in public transport. This study aims to analyze the factors causing bus arrival delays from a causal perspective, focusing on quantifying the causal contribution of each factor while considering their causal relationships. Quantifying a factor's causal contribution poses challenges due to computational complexity and statistical bias from the limited sample size. Using a causal discovery method, this study generates a causal graph for bus arrival delays and employs the causality-based Shapley value to quantify the contribution of each variable. The study further uses the Double Machine Learning (DML) approach to estimate the causal contributions, which provides a consistent and computationally feasible method. A case study was conducted using Google Transit Feed Specification (GTFS) data, focusing on high-frequency bus routes in Stockholm, Sweden. To validate the model, cross-validation was performed by comparing variable importance rankings with traditional models, including Linear Regression (LR) and Structural Equation Modeling (SEM). The comparison shows that results from the causality-based Shapley value significantly differ from those obtained by traditional methods in terms of importance rankings and influence magnitudes. The findings underscore the significant impact of origin delays on bus punctuality, a factor often underestimated in previous studies. Additionally, it demonstrates that employing a causal discovery model can not only infer causal relationships but also reveal direct and indirect effects, which can provide more intuitive explanations. Finally, although the causal results are mathematically and intuitively sound, it is important to further investigate the real causality impact in practice using lab experiments or A/B tests in real-world settings.
                    
                    
                            
                                Explainable AI
                            
                            
                                GTFS data
                            
                            
                                Causal graph discovery
                            
                            
                                Urban transit
                            
                            
                                Shapley value