Incremental haplotype inference, phylogeny, and almost bipartite graphs
Paper i proceeding, 2004
The paper addresses the combinatorial problem of inferring the unknown haplotypes in a population, given a sample of genotypes, under the assumption that the population forms
a perfect phylogeny (PP). It is important because physical haplotyping by DNA sequencing is expensive, whereas
genotypes are easier to obtain. Since PPs appear naturally and quite frequently, PP haplotyping is a favourable approach to reliable haplotype inference. Since Gusfield's paper from 2002, a few different algorithms have been proposed. Here we show that an extremely simple algorithm identifies, under the random mating assumption, all sufficiently frequent haplotypes in a random sample of
genotypes of asymptotically optimal size. Missing data can also be recovered if they are not too prevalent. Moreover, the idea of the algorithm easily extends to more general population structures than PP. We also solve a problem we call almost 2-coloring of graphs, which arises in an enhanced version of our haplotyping algorithm. We show that the solution space of this graph problem can be computed in linear time.