Machine Learning for Causal Inference from Observational Data with Applications in Healthcare
Research Project, 2020 – 2024

The recent growth of machine learning (ML) has primarily been built on the back of supervised learning systems that discover associations between inputs and outputs. However, this has a fundamental limitation—associations are not necessarily robust to changes in tasks over time or as a result of an intervention. Such changes are critical in important applications: when doctors prescribe a medication, they intervene on the patient’s state; when an autonomous car moves from the USA to Sweden, the inputs to its sensors are changed as a result; when an advertiser changes policy, they expose users to new impressions. For learning systems to behave desirably in these settings, they must gain causal understanding of their environment (Pearl, 2000). This project aims to advance the underexplored intersection of machine learning and causality. 

Randomized experiments are central to causal inference in general and medicine in particular. However, randomized clinical trials are costly to run, often generalize poorly to new patients and are difficult to apply to sequential decision-making problems. In contrast, observational data on decisions and outcomes are plentiful in electronic healthcare records, insurance data and registries and are typically representative of more general patient populations. Making reliable causal inferences from such data in the service of clinical policy is one of the grand challenges for machine learning and medicine going forward (Sherman, et al., 2016; Cave et al., 2019).

In this proposal, we will develop ML methods and theory for causal inference from observational data, and apply it in the evaluation of clinical policies. Specifically, we will develop a) methods and theory for learning causally sufficient representations of observational data and optimal policies for sequential decision-making, b) methods and theory for uncovering multiple pathways through which exposures affect outcomes from observational data, c) methods that combine observational and experimental data for validation and analysis.

This change in methodology from supervised to causal machine learning involves fundamental algorithmic innovations far beyond choosing a suitable model class. Finally, we will apply our methods in a real-world evaluation of clinical policy using US and European insurance data.


Fredrik Johansson (contact)

Chalmers, Computer Science and Engineering (Chalmers), Data Science


Wallenberg AI, Autonomous Systems and Software Program

Funding Chalmers participation during 2020–2024

Related Areas of Advance and Infrastructure

Information and Communication Technology

Areas of Advance

Health Engineering

Areas of Advance


More information

Latest update