Inferring evolution in bacteria using Markov chains and genomic signatures
Doktorsavhandling, 2006

This thesis concerns the development of methods and models in evolutionary molecular biology. The techniques are also applicable to other similar biological problems. The first contribution is a novel classifier using fixed and variable length Markov chains that can discriminate between bacterial DNA of different species. The classifier assumes that the composition of oligomers, DNA words, is species-specific and represents global features of the species, a so called genomic signature. The direct applications of such a classifier are: identification of horizontal gene transfer and binning of metagenomic data. The former has been the primary focus as it is one of the central processes in the evolution of bacteria. We suggest a new method for locking the number of parameters in a variable length Markov model and propose a method for rejecting false candidates of horizontal gene transfer events. The second contribution is a novel estimator for finding the prediction suffix tree of a variable length Markov chain. This new estimator is highly efficient in finding the correct state-space and we show that it compares favorably to a popular estimator in terms of the predictive likelihood. The third contribution is to the analysis of gene order rearrangements in bacteria. We recapitulate previous results on expected distances and derive new ones for cases that have recently gained support in the literature, such as symmetrical and short reversals. We also describe new categories of gene order patterns and show how these can be explained with models using short, symmetric and uniformly distributed transpositions and reversals. The forth contribution is a part of the Greengenes project which is a chimera free database of 16S rDNA sequences.

Prediction suffix tree

Markov chains

Molecular evolution


Gene order rearrangements

Horizontal gene transfer

Variable length Markov chains

10.15 ED, EDIT-huset, Rännvägen 6B, Chalmers
Opponent: Jotun Hein


Daniel Dalevi

Chalmers, Data- och informationsteknik, Datavetenskap

Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB

AEM,; Vol. 72(2006)p. 5069-5072

Artikel i vetenskaplig tidskrift


Bioinformatik och systembiologi



Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 2506

Technical report D - Department of Computer Science and Engineering, Chalmers University of Technology and Göteborg University: 21

10.15 ED, EDIT-huset, Rännvägen 6B, Chalmers

Opponent: Jotun Hein

Mer information