Descriptor and graph-based molecular representations in prediction of copolymer properties using machine learning
Artikel i vetenskaplig tidskrift, 2026

Copolymers are highly versatile materials with a vast range of possible chemical compositions. By using computational methods for property prediction, the design of copolymers can be accelerated, allowing for the prioritization of candidates with favorable properties. In this study, we utilized two distinct representations of molecular ensembles to predict the seven different physical polymer properties copolymers using machine learning: we used a random forest (RF) model to predict polymer properties from molecular descriptors, and a graph neural network (GNN) to predict the same properties from 2D polymer graphs under both a single- and multi-task setting. To train and evaluate the models, we constructed a data set from molecular dynamic simulations for 140 binary copolymers with varying monomer compositions and configurations. Our results demonstrate that descriptors-based RFs excel at predicting density and specific heat capacities at constant pressure (C<inf>p</inf>) and volume (C<inf>v</inf>) because these properties are strongly tied to specific molecular features captured by molecular descriptors. In contrast, graph representations better predict expansion coefficients (γ, α) and bulk modulus (K), which depend more on complex structural interactions better captured by graph-based models. This study underscores the importance of choosing appropriate representations for predicting molecular properties. Our findings demonstrate how machine learning models can expedite copolymer discovery with learnable structure–property relationships, streamlining polymer design and advancing the development of high-performance materials for diverse applications.

Graph neural networks (GNN)

Molecular descriptors

Random forest

copolymers

Machine learning

Författare

Elaheh Kazemi-Khasragh

Universidad Politecnica de Madrid

IMDEA Institute

Rocio Mercado

Chalmers, Data- och informationsteknik, Data Science och AI

Carlos Gonzalez

Universidad Politecnica de Madrid

IMDEA Materials Institute

Maciej Haranczyk

IMDEA Institute

Computational Materials Science

0927-0256 (ISSN)

Vol. 264 114475

Ämneskategorier (SSIF 2025)

Polymerkemi

Styrkeområden

Materialvetenskap

DOI

10.1016/j.commatsci.2025.114475

Relaterade dataset

Descriptor and Graph-based Molecular Representations in Prediction of Copolymer Properties using Machine Learning [dataset]

DOI: 10.5281/zenodo.13752404

Mer information

Skapat

2026-05-29