Development and application of omics data analysis tools to examine molecular associations linking complex exposures to cardiovascular disease
Doctoral thesis, 2025
This thesis includes development of two tools presented as R packages that make advanced omics data analysis accessible to a broader research community: 1) MUVR2 provides supervised machine learning with nested cross-validation to mitigate overfitting and false discovery combined with variable selection in high-dimensional data. MUVR2 includes elastic net which allows covariate adjustment thus enhancing epidemiological modeling. 2) TriplotGUI is a user-friendly tool integrating omics data reduction with meet-in-the-middle and mediation analyses to explore exposure-omics-outcome associations through intuitive visualizations.
Using earlier versions of these tools on the Swedish Mammography Cohort revealed two distinct omics sub-patterns linking POPs to CVD. The first involved perturbed lipid metabolism and inflammatory pathways associated with higher levels of organochlorine compounds, lower levels of per- and polyfluoroalkyl substance and higher myocardial infarction (MI) risk. The second involved carnitines and possible mitochondrial dysfunction and associated with OCs and stroke.
MUVR2 and TriplotGUI were applied to discover and replicate metabolites associated with diet, POPs, gut microbiota, and CVD incidence using four Nordic cohorts. Notable findings supporting metabolites mediating exposure-outcome associations included: An association between nuts and dried fruit and reduced MI risk, possibly mediated by pipecolate. Moreover, associations between fish intake and reduced MI risk, possibly mediated by phosphatidyl-ethanolamine(P-16:0/22:6) and an unknown metabolite. Importantly, only few exposure-metabolite-outcome associations were reproduced across cohorts, stressing the importance of replication for generalizable conclusions.
This thesis contributes advanced, accessible methods for linking environmental exposures to health outcomes through omics-based mediators, with MUVR2 and TriplotGUI improving the identification and interpretation of molecular signatures. Application of these tools to CVD enabled characterization of molecular signatures linking diet to health and underscored the necessity of rigorous external validation to minimize spurious associations.
omics
meet-in-the-middle analysis
mediation analysis
molecular epidemiology
cross-cohort design
cardiovascular disease
diet
machine learning
gut microbiota
persistent organic pollutants
Author
Yingxiao Yan
Chalmers, Life Sciences, Food and Nutrition Science
Adjusting for covariates and assessing modeling fitness in machine learning using MUVR2
Bioinformatics Advances,;Vol. 4(2024)
Journal article
Yan Y, Schillemans T, Ribbenstedt A, Brunius C. Software Application Profile: TriplotGUI, A Molecular Epidemiology Toolbox for Investigating Associations between Exposures, Omics and Outcomes
OMICs Signatures Linking Persistent Organic Pollutants to Cardiovascular Disease in the Swedish Mammography Cohort
Environmental Science & Technology,;Vol. 58(2024)p. 1036-1047
Journal article
Yan Y, Schillemans T, Toubon G, Ribbenstedt A, Åkesson A, Johansson I, Bergdahl I, Brunius C. Metabolic signatures linking multiple environmental exposures to cardiovascular disease risk: A multi-cohort discovery and validation study
This thesis describes the development of new tools to analyze how different environmental factors jointly influence disease risk through biological perturbations, by analyzing data from several thousand individuals across the Nordic countries. Nuts and dried fruits were associated with reduced heart attack risk, with evidence suggesting that the metabolite pipecolate may contribute to this protective effect, possibly by reducing inflammation and regulating cellular processes. Eating salmon was also associated with lower heart attack risk. While salmon offers cardioprotective nutrients such as so-called omega-3-fatty acids, it is also a major dietary source of organochlorine compounds and per- and polyfluoroalkyl substances (PFAS), chemical pollutants that adversely impact health at multiple stages of life. This highlights that choosing healthy food involves weighing both nutritional benefits and potential risks. Another key finding was that most associations found in single study populations did not hold true when tested across different cohorts, stressing the importance of replication across cohorts with different characteristics for wider generalizability.
The work in this thesis has advanced methods and used them to link environmental factors to cardiovascular diseases, providing knowledge that may contribute to better prevention.
Impact of Combined Exposures on Metabolic Health (ICE)
Formas (2020-01653), 2021-01-01 -- 2024-12-31.
Dynamic longitudinal exposome trajectories in cardiovascular and metabolic non-communicable diseases’ — ‘LONGITOOLS’
European Commission (EC) (EC/H2020/874739), 2019-12-31 -- 2023-12-31.
Subject Categories (SSIF 2025)
Public Health, Global Health and Social Medicine
Bioinformatics (Computational Biology)
Bioinformatics and Computational Biology
Driving Forces
Sustainable development
Infrastructure
Chalmers Infrastructure for Mass spectrometry
Areas of Advance
Health Engineering
DOI
10.63959/chalmers.dt/5694
ISBN
978-91-8103-236-9
Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: Ny serie 5694 ISSN 0346-718X
Publisher
Chalmers
KB, Kemihuset
Opponent: Marc Chadeau-Hyam, Professor of Computational Epidemiology and Biostatistics, School of Public Health - Faculty of Medicine, Imperial College London, the United Kindom.