DOGMA: de novo assembly of densely labelled optical DNA maps using a matrix profile approach
Journal article, 2025

In optical genome mapping (OGM), large numbers of individual DNA maps—sequence-specific data series along single DNA molecules—are produced. Such individual maps have to be stitched together in a process called de novo OGM assembly in order to create consensus OGM maps for corresponding regions along the chromosomes. While there are several types of experimental OGM assays, not all of them have de novo OGM assembly tools available. In particular, in densely-labelled OGM there are no such tools. Here, we present and evaluate DOGMA, a de novo OGM assembly algorithm for densely labelled OGM data which uses matrix profiles. Matrix profile has transformed how data mining problems are approached in time series analysis. Yet, this algorithm has not been widely explored outside of the time series community— we here use it for OGM de novo assembly for the first time. Further novelties in our algorithm are the introduction of two scores for each individual alignment, use of p-values, a visual representation as barcode islands and the introduction of a method for generating consensus barcodes using amplitude adjustment. Utilizing p-values helps mitigate the risk of errors in the assemblies as caused by false positives. We demonstrate our algorithm by applying it for de novo OGM assembly of synthetic datasets and of an experimental dataset from an Escherichia coli genome. We validate the assemblies using corresponding reference genomes and investigate the strengths and limitations of the algorithm. De novo OGM assembly of dense optical DNA maps shows promise as a complement or an alternative to current OGM techniques for other types of genome mapping assays. The code is available at: https://github.com/dnadevcode/dogma.

Author

Albertas Dvirnas

Chalmers, Life Sciences, Chemical Biology

Lund University

Luis Leal Garza

Chalmers, Life Sciences, Chemical Biology

Zahra Abbaspour

Chalmers, Life Sciences, Chemical Biology

Erik Fröbrant

Lund University

Karolin Frykholm

Chalmers, Life Sciences, Chemical Biology

CARe

Marie Wrande

Uppsala University

L. Sandegren

Uppsala University

Fredrik Westerlund

Chalmers, Life Sciences, Chemical Biology

CARe

Tobias Ambjörnsson

Lund University

PLoS ONE

1932-6203 (ISSN) 19326203 (eISSN)

Vol. 20 12 December e0335633

Subject Categories (SSIF 2025)

Other Engineering and Technologies

Bioinformatics (Computational Biology)

Bioinformatics and Computational Biology

Computer Sciences

Other Computer and Information Science

DOI

10.1371/journal.pone.0335633

PubMed

41325363

More information

Latest update

12/8/2025