Incremental haplotype inference, phylogeny, and almost bipartite graphs
Paper i proceeding, 2004

The paper addresses the combinatorial problem of inferring the unknown haplotypes in a population, given a sample of genotypes, under the assumption that the population forms a perfect phylogeny (PP). It is important because physical haplotyping by DNA sequencing is expensive, whereas genotypes are easier to obtain. Since PPs appear naturally and quite frequently, PP haplotyping is a favourable approach to reliable haplotype inference. Since Gusfield's paper from 2002, a few different algorithms have been proposed. Here we show that an extremely simple algorithm identifies, under the random mating assumption, all sufficiently frequent haplotypes in a random sample of genotypes of asymptotically optimal size. Missing data can also be recovered if they are not too prevalent. Moreover, the idea of the algorithm easily extends to more general population structures than PP. We also solve a problem we call almost 2-coloring of graphs, which arises in an enhanced version of our haplotyping algorithm. We show that the solution space of this graph problem can be computed in linear time.

random mating

bipartite graphs

perfect phylogeny

haplotype phasing

Författare

Peter Damaschke

Chalmers, Institutionen för datavetenskap, Algoritmer

Chalmers, Institutionen för datavetenskap, Bioinformatik

2nd RECOMB Satellite Workshop on Computational Methods for SNPs and Haplotypes, Proceedings (Prepint), Carnegie Mellon University, Pittsburgh 2004

1-11

Ämneskategorier

Data- och informationsvetenskap

Mer information

Skapat

2017-10-07