Fast parallel construction of variable-length Markov chains
Journal article, 2021
Results: An extensive evaluation was performed on genomes ranging from 12Mbp to 22Gbp. Relevant learning parameters were chosen guided by the Bayesian Information Criterion (BIC) to avoid over-fitting. Our implementation greatly improves upon the state-of-the-art even in serial execution. It exhibits very good parallel scaling with speed-ups for long sequences close to the optimum indicated by Amdahl's law of 3 for 4 threads and about 6 for 16 threads, respectively.
Conclusions: Our parallel implementation released as open-source under the GPLv3 license provides a practically useful alternative to the state-of-the-art which allows the construction of VLMCs even for very large genomes significantly faster than previously possible. Additionally, our parameter selection based on BIC gives guidance to endusers comparing genomes.
Author
Joel Gustafsson
University of Gothenburg
Peter Norberg
University of Gothenburg
Jan R. Qvick-Wester
University of Gothenburg
Alexander Schliep
University of Gothenburg
Chalmers, Computer Science and Engineering (Chalmers), Data Science
BMC Bioinformatics
14712105 (eISSN)
Vol. 22 1 487Subject Categories (SSIF 2025)
Bioinformatics (Computational Biology)
Computer Sciences
DOI
10.1186/s12859-021-04387-y
PubMed
34627154