Fast algorithms for finding disjoint subsequences with extremal densities
Artikel i vetenskaplig tidskrift, 2006

We derive fast algorithms for the following problem: Given a set of n points on the real line and two parameters s and p, find s disjoint intervals of maximum total length that contain at most p of the given points. Our main contribution consists of algorithms whose time bounds improve upon a straightforward dynamic programming algorithm, in the relevant case that input size n is much bigger than parameters s and p. These results are achieved by selecting a few candidate intervals that are provably sufficient for building an optimal solution via dynamic programming. As a byproduct of this idea we improve an algorithm for a similar subsequence problem of Chen, Lu and Tang (2005). The problems are motivated by the search for significant patterns in biological data. Finally we propose several heuristics that further reduce the time complexity in typical instances. One of them leads to an apparently open subsequence sum problem of independent interest.

holes in data

time complexity

dynamic programming

range prediction

protein structure prediction

selection algorithms

protein torsion angle

Författare

Peter Damaschke

Chalmers, Data- och informationsteknik, Datavetenskap

Anders Bergkvist

Chalmers University of Technology

Pattern Recognition

0031-3203 (ISSN)

Vol. 39 2281-2292

Ämneskategorier

Datavetenskap (datalogi)

DOI

10.1016/j.patcog.2006.01.008