Adjusting for covariates and assessing modeling fitness in machine learning using MUVR2
Artikel i vetenskaplig tidskrift, 2024

Motivation: Machine learning (ML) methods are frequently used in Omics research to examine associations between molecular data and for example exposures and health conditions. ML is also used for feature selection to facilitate biological interpretation. Our previous MUVR algorithm was shown to generate predictions and variable selections at state-of-the-art performance. However, a general framework for assessing modeling fitness is still lacking. In addition, enabling to adjust for covariates is a highly desired, but largely lacking trait in ML. We aimed to address these issues in the new MUVR2 framework. Results: The MUVR2 algorithm was developed to include the regularized regression framework elastic net in addition to partial least squares and random forest modeling. Compared with other cross-validation strategies, MUVR2 consistently showed state-of-the-art performance, including variable selection, while minimizing overfitting. Testing on simulated and real-world data, we also showed that MUVR2 allows for the adjustment for covariates using elastic net modeling, but not using partial least squares or random forest.

Författare

Yingxiao Yan

Chalmers, Life sciences, Livsmedelsvetenskap

Tessa Schillemans

Karolinska Institutet

Viktor Skantze

Stiftelsen Fraunhofer-Chalmers Centrum för Industrimatematik

Carl Brunius

Chalmers, Life sciences, Livsmedelsvetenskap

Bioinformatics Advances

26350041 (eISSN)

Vol. 4 1 vbae051

Miljöexponeringars kombinerade inverkan på metabol hälsa

Formas (2020-01653), 2021-01-01 -- 2024-12-31.

Ämneskategorier

Bioinformatik (beräkningsbiologi)

Sannolikhetsteori och statistik

DOI

10.1093/bioadv/vbae051

PubMed

38645717

Mer information

Senast uppdaterat

2024-05-03