Deep generative models for analysis and engineering of functional proteins
Doctoral thesis, 2025

Proteins are essential biological molecules that sustain life through diverse functions, fromstructural support to catalyzing biochemical reactions. Their catalytic efficiency makes theminvaluable for industrial applications, where they often require optimization to function underspecific conditions. While experimental and computational approaches have made progress inprotein engineering, no universal method exists due to the complexity of protein structure andfunction. Recent advances in machine learning offer new possibilities by leveraging vastprotein sequence data. However, key challenges remain, including the limited availability anduneven distribution of high-quality labels describing essential properties like enzymaticactivity and thermal stability. Addressing these issues is critical for developing models capableof accurate trait selection. My work focuses on two key steps in protein engineering:diversification and selection. To improve selection, deep learning models were developed usingtransfer learning, data augmentation, and protein language models (pLMs) to predict physicaland functional properties such as melting temperature, enzymatic temperature, proteinabundance, and in vitro activity. These models not only enable precise trait selection but alsoprovide insights into the relationships between sequence, thermal adaptation, andconformational stability. For diversification, a deep generative model was created to capturenatural sequence diversity and extend it to generate novel variant libraries across proteinfamilies. This approach prioritizes functional sequences and allows for targeted engineering ofproteins with enhanced properties. Moving beyond general sequence generation, a frameworkwas developed to create variant pools optimized for specific traits, such as increased thermalstability. By integrating these advancements, we engineered functional protein variants fromdiverse wild-type sequences, achieving up to a 36C increase in melting temperature. This workhighlights the potential of generative machine learning to refine and accelerate the proteinengineering cycle, paving the way for more efficient and scalable biotechnologicalapplications.

generative AI

protein engineering

thermal stability

machine learning

deep learning

HC4-salen, Hörsalsvägen 14
Opponent: Stanislav Mazurenko

Author

Sandra Viknander

Chalmers, Life Sciences, Systems and Synthetic Biology

Expanding functional protein sequence spaces using generative adversarial networks

Nature Machine Intelligence,;Vol. 3(2021)p. 324-333

Journal article

Learning deep representations of enzyme thermal adaptation

Protein Science,;Vol. 31(2022)

Journal article

Computational scoring and experimental evaluation of enzymes generated by neural networks

Nature Biotechnology,;Vol. 43(2025)p. 396-405

Journal article

Sandra Viknander, Nikolaos Tatarakis, Xiaozhi Fu, Clara Goldin, Alexander Diaciuc, Aleksej Zelezniak. Learning Thermal Adaptation through Adversarial and Evolutionary aware training

Proteiner är livets byggstenar. De skapar strukturer, styr kemiska reaktioner och möjliggör en mängd biologiska funktioner. Men proteiner är inte bara viktiga i levande organismer – de används också inom industrin, från läkemedelsutveckling till miljövänlig kemisk produktion. För att anpassa proteiner till specifika behov krävs avancerad ingenjörskonst och extremt mycket experimentellt arbete för att förbättra och designa nya proteiner.

Med framstegen inom maskininlärning öppnas nu nya möjligheter. Genom att analysera de enorma mängder data som finns tillgänglig för proteiner kan AI hjälpa oss att förutsäga viktiga egenskaper, som hur stabilt ett protein är vid höga temperaturer eller hur väl det fungerar som enzym.

I min forskning har jag därför utvecklat AI modeller som kan hjälpa till vid design och optimering av nya proteiner. Med hjälp av dessa modeller har vi lyckats skapa proteiner som är betydligt mer värmestabila, i vissa fall med en förbättrad värmetålighet från 56°C till 92°C. Det här betyder att vi kan designa enzymer som fungerar i tuffa industriella miljöer, vilket kan leda till mer hållbara och effektiva biotekniska processer.

Denna forskning visar hur generativ AI kan revolutionera proteinutveckling, genom att göra processen snabbare, mer precis och mer förutsägbar. Med dessa verktyg tar vi ett steg närmare en framtid där vi kan skräddarsy proteiner för specifika uppgifter från bättre läkemedel till miljövänliga industriella lösningar.

Subject Categories (SSIF 2025)

Molecular Biology

Bioinformatics and Computational Biology

Structural Biology

Artificial Intelligence

Driving Forces

Sustainable development

Innovation and entrepreneurship

Roots

Basic sciences

Infrastructure

C3SE (-2020, Chalmers Centre for Computational Science and Engineering)

Areas of Advance

Life Science Engineering (2010-2018)

ISBN

978-91-8103-187-4

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5645

Publisher

Chalmers

HC4-salen, Hörsalsvägen 14

Opponent: Stanislav Mazurenko

More information

Latest update

3/7/2025 6