Combining dense and sparse labeling in optical DNA mapping
Artikel i vetenskaplig tidskrift, 2021

Optical DNA mapping (ODM) is based on fluorescent labeling, stretching and imaging of single DNA molecules to obtain sequence-specific fluorescence profiles, DNA barcodes. These barcodes can be mapped to theoretical counterparts obtained from DNA reference sequences, which in turn allow for DNA identification in complex samples and for detecting structural changes in individual DNA molecules. There are several types of DNA labeling schemes for ODM and for each labeling type one or several types of match scoring methods are used. By combining the information from multiple labeling schemes one can potentially improve mapping confidence; however, combining match scores from different labeling assays has not been implemented yet. In this study, we introduce two theoretical methods for dealing with analysis of DNA molecules with multiple label types. In our first method, we convert the alignment scores, given as output from the different assays, into p-values using carefully crafted null models. We then combine the p-values for different label types using standard methods to obtain a combined match score and an associated combined p-value. In the second method, we use a block bootstrap approach to check for the uniqueness of a match to a database for all barcodes matching with a combined p-value below a predefined threshold. For obtaining experimental dual-labeled DNA barcodes, we introduce a novel assay where we cut plasmid DNA molecules from bacteria with restriction enzymes and the cut sites serve as sequence-specific markers, which together with barcodes obtained using the established competitive binding labeling method, form a dual-labeled barcode. All experimental data in this study originates from this assay, but we point out that our theoretical framework can be used to combine data from all kinds of available optical DNA mapping assays. We test our multiple labeling frameworks on barcodes from two different plasmids and synthetically generated barcodes (combined competitive-binding- and nick-labeling). It is demonstrated that by simultaneously using the information from all label types, we can substantially increase the significance when we match experimental barcodes to a database consisting of theoretical barcodes for all sequenced plasmids.

Författare

Erik Torstensson

Lunds universitet

Gaurav Goyal

Chalmers, Biologi och bioteknik, Kemisk biologi

Anna Johnning

Göteborgs universitet

Stiftelsen Fraunhofer-Chalmers Centrum för Industrimatematik

Chalmers, Matematiska vetenskaper, Tillämpad matematik och statistik

Fredrik Westerlund

Chalmers, Biologi och bioteknik, Kemisk biologi

Tobias Ambjörnsson

Lunds universitet

PLoS ONE

1932-6203 (ISSN) 19326203 (eISSN)

Vol. 16 11 November e0260489

Ämneskategorier

Bioinformatik (beräkningsbiologi)

Bioinformatik och systembiologi

Annan industriell bioteknik

DOI

10.1371/journal.pone.0260489

PubMed

34843574

Mer information

Senast uppdaterat

2021-12-07