Computational scoring and experimental evaluation of enzymes generated by neural networks
Journal article, 2024

In recent years, generative protein sequence models have been developed to sample novel sequences. However, predicting whether generated proteins will fold and function remains challenging. We evaluate a set of 20 diverse computational metrics to assess the quality of enzyme sequences produced by three contrasting generative models: ancestral sequence reconstruction, a generative adversarial network and a protein language model. Focusing on two enzyme families, we expressed and purified over 500 natural and generated sequences with 70–90% identity to the most similar natural sequences to benchmark computational metrics for predicting in vitro enzyme activity. Over three rounds of experiments, we developed a computational filter that improved the rate of experimental success by 50–150%. The proposed metrics and models will drive protein engineering research by serving as a benchmark for generative protein sequence models and helping to select active variants for experimental testing.

Author

Sean R. Johnson

New England Biolabs

Xiaozhi Fu

Chalmers, Life Sciences, Systems and Synthetic Biology

Sandra Viknander

Chalmers, Life Sciences, Systems and Synthetic Biology

Clara Goldin

Chalmers, Life Sciences, Systems and Synthetic Biology

Sarah Monaco

Invitae

Aleksej Zelezniak

Vilnius University

Chalmers, Life Sciences, Systems and Synthetic Biology

Faculty of Life Sciences & Medicine

Kevin K. Yang

Microsoft Research

Nature Biotechnology

1087-0156 (ISSN) 15461696 (eISSN)

Vol. In Press

MetaPlast: Exploiting microbial dark matter for engineering plastic degradation microbial system

Formas (2019-01403), 2020-01-01 -- 2023-12-31.

Using AI to unravel "DNA grammar" for synthetic biology applications

Swedish Research Council (VR) (2019-05356), 2020-01-01 -- 2024-12-31.

Subject Categories (SSIF 2011)

Bioinformatics (Computational Biology)

Bioinformatics and Systems Biology

DOI

10.1038/s41587-024-02214-2

PubMed

38653796

More information

Latest update

8/19/2024