Incremental haplotype inference, phylogeny, and almost bipartite graphs
Paper in proceeding, 2004

The paper addresses the combinatorial problem of inferring the unknown haplotypes in a population, given a sample of genotypes, under the assumption that the population forms a perfect phylogeny (PP). It is important because physical haplotyping by DNA sequencing is expensive, whereas genotypes are easier to obtain. Since PPs appear naturally and quite frequently, PP haplotyping is a favourable approach to reliable haplotype inference. Since Gusfield's paper from 2002, a few different algorithms have been proposed. Here we show that an extremely simple algorithm identifies, under the random mating assumption, all sufficiently frequent haplotypes in a random sample of genotypes of asymptotically optimal size. Missing data can also be recovered if they are not too prevalent. Moreover, the idea of the algorithm easily extends to more general population structures than PP. We also solve a problem we call almost 2-coloring of graphs, which arises in an enhanced version of our haplotyping algorithm. We show that the solution space of this graph problem can be computed in linear time.

random mating

bipartite graphs

perfect phylogeny

haplotype phasing

Author

Peter Damaschke

Chalmers, Department of Computing Science, Algorithms

Chalmers, Department of Computing Science, Bioinformatics

2nd RECOMB Satellite Workshop on Computational Methods for SNPs and Haplotypes, Proceedings (Prepint), Carnegie Mellon University, Pittsburgh 2004

1-11

Subject Categories (SSIF 2011)

Computer and Information Science

More information

Created

10/7/2017