Fast perfect phylogeny haplotype inference
Paper i proceeding, 2003
We address the problem of reconstructing haplotypes in a population, given a sample of genotypes and assumptions about the underlying population. The problem is of major interest in genetics because haplotypes are more
informative than genotypes when it comes to searching for trait genes, but it is difficult to get them directly by sequencing. After showing that simple resolution-based inference can be terribly wrong in some natural types of
population, we propose a different combinatorial approach exploiting intersections of sampled genotypes (considered
as sets of candidate haplotypes). For populations with perfect phylogeny we obtain an inference algorithm which is both sound and efficient. It yields with high propability
the complete set of haplotypes showing up in the sample,
for a sample size close to the trivial lower bound. The perfect phylogeny assumption is often justified, but we
also believe that the ideas can be further extended to
populations obeying relaxed structural assumptions. The ideas are quite different from other existing practical algorithms for the problem.