Fast algorithms for finding disjoint subsequences with extremal densities
Paper i proceeding, 2005
We derive fast algorithms for the problem of finding, on
the real line, a prescribed number of intervals of maximum total length that contain at most some prescribed number of points from a given point set. Basically this is a typical
dynamic programming problem, however, for input sizes much bigger than the two parameters we can improve the obvious time bound by selecting a restricted set of candidate intervals that are sufficient to build some optimal solution. As a byproduct, the same idea improves an algorithm for a similar subsequence problem recently
brought up by Chen, Lu and Tang at IWBRA 2005. The problems are motivated by the search for significant patterns in certain biological data. While the algorithmic idea for the asymptotic worst-case bound is rather evident, we also consider further heuristics to save even more time in typical instances. One of them, described in this paper, leads to an apparently open problem of computational geometry flavour (where we are seeking a subquadratic algorithm) which might be interesting in itself.
holes in data
data mining
sparse dynamic programming