Using AI to unravel "DNA grammar" for synthetic biology applications
Research Project, 2020 – 2024

Understanding how cells transfer information from genotype into a phenotype has been the ultimate question in Biology since the introduction of the central molecular dogma over 70 years ago. Nevertheless, the question how complex “DNA grammar” determines the levels of transcript and protein expression, i.e. critical processes underlying development, adaptation, growth, and reproduction in all living organisms, is still unanswered. Coding and regulatory non-coding regions are crucial in the regulation of gene expression, but current biological models are still far from the quantitative understanding that enables accurate predictions of gene expression and its ultimately application in biomedicine and biotechnology. Here, we aim to develop and verify experimentally an AI approach for learning biological signals encoded in the DNA that control gene expression to enable AI-assisted protein sequence design. Preliminary data, using deep neural networks on gene expression data showed that gene levels are predictable (with Saccharomyces cerevisiae only based on DNA code. The project proposed is expected to challenge the synthetic biology field, by enabling the design of synthetic proteins and possibly entire pathways for engineering cell factories. This AI approach for decoding “DNA grammar” will greatly advance the our understanding of what drives gene expression in normal and pathologic conditions, with applications for the design of more effective gene therapies.


Aleksej Zelezniak (contact)

Chalmers, Life Sciences, Systems and Synthetic Biology


Swedish Research Council (VR)

Project ID: 2019-05356
Funding Chalmers participation during 2020–2024


More information

Latest update