Homogeneous string segmentation using trees and weighted independent sets
Journal article, 2010

We divide a string into k segments, each with only one sort of symbols, so as to minimize the total number of exceptions. Motivations come from machine learning and data mining. For binary strings we develop a linear-time algorithm for any k. Key to efficiency is a special-purpose data structure, called W-tree, which reflects relations between repetition lengths of symbols. For non-binary strings we give a nontrivial dynamic programming algorithm. Our problem is equivalent to finding weighted independent sets with certain size constraints, either in paths (binary case) or special interval graphs (general case). We also show that this problem is FPT in bounded-degree graphs.

parameterized complexity

interval graphs

segmentation

weighted independent set

tree computations

dynamic programming

Author

Peter Damaschke

Chalmers, Computer Science and Engineering (Chalmers), Computing Science (Chalmers)

Algorithmica

0178-4617 (ISSN) 1432-0541 (eISSN)

Vol. 57 4 621-640

Subject Categories (SSIF 2011)

Computer Science

DOI

10.1007/s00453-008-9225-8

More information

Created

10/6/2017