Computational scoring and experimental evaluation of enzymes generated by neural networks
Artikel i vetenskaplig tidskrift, 2024

In recent years, generative protein sequence models have been developed to sample novel sequences. However, predicting whether generated proteins will fold and function remains challenging. We evaluate a set of 20 diverse computational metrics to assess the quality of enzyme sequences produced by three contrasting generative models: ancestral sequence reconstruction, a generative adversarial network and a protein language model. Focusing on two enzyme families, we expressed and purified over 500 natural and generated sequences with 70–90% identity to the most similar natural sequences to benchmark computational metrics for predicting in vitro enzyme activity. Over three rounds of experiments, we developed a computational filter that improved the rate of experimental success by 50–150%. The proposed metrics and models will drive protein engineering research by serving as a benchmark for generative protein sequence models and helping to select active variants for experimental testing.


Sean R. Johnson

New England Biolabs

Xiaozhi Fu

Chalmers, Life sciences, Systembiologi

Sandra Viknander

Chalmers, Life sciences, Systembiologi

Clara Goldin

Chalmers, Life sciences, Systembiologi

Sarah Monaco

Invitae Corporation

Aleksej Zelezniak

Faculty of Life Sciences & Medicine

Chalmers, Life sciences, Systembiologi

Vilniaus universitetas

Kevin K. Yang

Microsoft Research

Nature Biotechnology

1087-0156 (ISSN) 15461696 (eISSN)

Vol. In Press

Använda AI för att upptäcka "DNA-grammatik" för syntetiska biologiska tillämpningar

Vetenskapsrådet (VR) (2019-05356), 2020-01-01 -- 2024-12-31.

MetaPlast: Nyttjande av "den mikrobiella mörka material" för att designa mikrobiella system som degraderar plast

Formas (2019-01403), 2020-01-01 -- 2023-12-31.


Bioinformatik (beräkningsbiologi)

Bioinformatik och systembiologi





Mer information

Senast uppdaterat