DOGMA: de novo assembly of densely labelled optical DNA maps using a matrix profile approach
Artikel i vetenskaplig tidskrift, 2025

In optical genome mapping (OGM), large numbers of individual DNA maps—sequence-specific data series along single DNA molecules—are produced. Such individual maps have to be stitched together in a process called de novo OGM assembly in order to create consensus OGM maps for corresponding regions along the chromosomes. While there are several types of experimental OGM assays, not all of them have de novo OGM assembly tools available. In particular, in densely-labelled OGM there are no such tools. Here, we present and evaluate DOGMA, a de novo OGM assembly algorithm for densely labelled OGM data which uses matrix profiles. Matrix profile has transformed how data mining problems are approached in time series analysis. Yet, this algorithm has not been widely explored outside of the time series community— we here use it for OGM de novo assembly for the first time. Further novelties in our algorithm are the introduction of two scores for each individual alignment, use of p-values, a visual representation as barcode islands and the introduction of a method for generating consensus barcodes using amplitude adjustment. Utilizing p-values helps mitigate the risk of errors in the assemblies as caused by false positives. We demonstrate our algorithm by applying it for de novo OGM assembly of synthetic datasets and of an experimental dataset from an Escherichia coli genome. We validate the assemblies using corresponding reference genomes and investigate the strengths and limitations of the algorithm. De novo OGM assembly of dense optical DNA maps shows promise as a complement or an alternative to current OGM techniques for other types of genome mapping assays. The code is available at: https://github.com/dnadevcode/dogma.

Författare

Albertas Dvirnas

Chalmers, Life sciences, Kemisk biologi

Lunds universitet

Luis Leal Garza

Chalmers, Life sciences, Kemisk biologi

Zahra Abbaspour

Chalmers, Life sciences, Kemisk biologi

Erik Fröbrant

Lunds universitet

Karolin Frykholm

Chalmers, Life sciences, Kemisk biologi

CARe

Marie Wrande

Uppsala universitet

L. Sandegren

Uppsala universitet

Fredrik Westerlund

Chalmers, Life sciences, Kemisk biologi

CARe

Tobias Ambjörnsson

Lunds universitet

PLoS ONE

1932-6203 (ISSN) 19326203 (eISSN)

Vol. 20 12 December e0335633

Ämneskategorier (SSIF 2025)

Annan teknik

Bioinformatik (beräkningsbiologi)

Bioinformatik och beräkningsbiologi

Datavetenskap (datalogi)

Annan data- och informationsvetenskap

DOI

10.1371/journal.pone.0335633

PubMed

41325363

Mer information

Senast uppdaterat

2025-12-08