Descriptor and graph-based molecular representations in prediction of copolymer properties using machine learning
Journal article, 2026

Copolymers are highly versatile materials with a vast range of possible chemical compositions. By using computational methods for property prediction, the design of copolymers can be accelerated, allowing for the prioritization of candidates with favorable properties. In this study, we utilized two distinct representations of molecular ensembles to predict the seven different physical polymer properties copolymers using machine learning: we used a random forest (RF) model to predict polymer properties from molecular descriptors, and a graph neural network (GNN) to predict the same properties from 2D polymer graphs under both a single- and multi-task setting. To train and evaluate the models, we constructed a data set from molecular dynamic simulations for 140 binary copolymers with varying monomer compositions and configurations. Our results demonstrate that descriptors-based RFs excel at predicting density and specific heat capacities at constant pressure (C<inf>p</inf>) and volume (C<inf>v</inf>) because these properties are strongly tied to specific molecular features captured by molecular descriptors. In contrast, graph representations better predict expansion coefficients (γ, α) and bulk modulus (K), which depend more on complex structural interactions better captured by graph-based models. This study underscores the importance of choosing appropriate representations for predicting molecular properties. Our findings demonstrate how machine learning models can expedite copolymer discovery with learnable structure–property relationships, streamlining polymer design and advancing the development of high-performance materials for diverse applications.

Graph neural networks (GNN)

Molecular descriptors

Random forest

copolymers

Machine learning

Author

Elaheh Kazemi-Khasragh

Technical University of Madrid

IMDEA Institute

Rocio Mercado

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

Carlos Gonzalez

Technical University of Madrid

IMDEA Materials Institute

Maciej Haranczyk

IMDEA Institute

Computational Materials Science

0927-0256 (ISSN)

Vol. 264 114475

Subject Categories (SSIF 2025)

Polymer Chemistry

Areas of Advance

Materials Science

DOI

10.1016/j.commatsci.2025.114475

Related datasets

Descriptor and Graph-based Molecular Representations in Prediction of Copolymer Properties using Machine Learning [dataset]

DOI: 10.5281/zenodo.13752404

More information

Created

5/29/2026