Bayesian localization of CNV candidates in WGS data within minutes
Artikel i vetenskaplig tidskrift, 2019
Results: In this paper, we propose an improved algorithmic framework for this approach. We provide new space-efficient data structures to query sufficient statistics in logarithmic time, based on a linear-Time, in-place transform of the data, which also improves on the compression ratio. We also propose a new approach to efficiently store and update marginal state counts obtained from the Gibbs sampler.
Conclusions: Using this approach, we discover several CNV candidates in two rat populations divergently selected for tame and aggressive behavior, consistent with earlier results concerning the domestication syndrome as well as experimental observations. Computationally, we observe a 29.5-fold decrease in memory, an average 5.8-fold speedup, as well as a 191-fold decrease in minor page faults. We also observe that metrics varied greatly in the old implementation, but not the new one. We conjecture that this is due to the better compression scheme. The fully Bayesian segmentation of the entire WGS data set required 3.5 min and 1.24 GB of memory, and can hence be performed on a commodity laptop.
HMM
CNV
Wavelet
Bayesian inference
Författare
John Wiedenhoeft
Chalmers, Data- och informationsteknik, Data Science
Rutgers University
Alex Cagan
Wellcome Trust Sanger Institute
Max-Planck-Gesellschaft
Rimma Kozhemyakina
Russian Academy of Sciences
Rimma Gulevich
Russian Academy of Sciences
Alexander Schliep
Rutgers University
Göteborgs universitet
Algorithms for Molecular Biology
17487188 (eISSN)
Vol. 14 1 20Ämneskategorier
Bioinformatik (beräkningsbiologi)
Sannolikhetsteori och statistik
Datorseende och robotik (autonoma system)
DOI
10.1186/s13015-019-0154-7