Fast parallel construction of variable-length Markov chains
Artikel i vetenskaplig tidskrift, 2021
Results: An extensive evaluation was performed on genomes ranging from 12Mbp to 22Gbp. Relevant learning parameters were chosen guided by the Bayesian Information Criterion (BIC) to avoid over-fitting. Our implementation greatly improves upon the state-of-the-art even in serial execution. It exhibits very good parallel scaling with speed-ups for long sequences close to the optimum indicated by Amdahl's law of 3 for 4 threads and about 6 for 16 threads, respectively.
Conclusions: Our parallel implementation released as open-source under the GPLv3 license provides a practically useful alternative to the state-of-the-art which allows the construction of VLMCs even for very large genomes significantly faster than previously possible. Additionally, our parameter selection based on BIC gives guidance to endusers comparing genomes.
Författare
Joel Gustafsson
Göteborgs universitet
Peter Norberg
Göteborgs universitet
Jan R. Qvick-Wester
Göteborgs universitet
Alexander Schliep
Göteborgs universitet
Chalmers, Data- och informationsteknik, Data Science
BMC Bioinformatics
14712105 (eISSN)
Vol. 22 1 487Ämneskategorier (SSIF 2025)
Bioinformatik (beräkningsbiologi)
Datavetenskap (datalogi)
DOI
10.1186/s12859-021-04387-y
PubMed
34627154