Segmenting strings homogeneously via trees
Paper in proceedings, 2007
We divide a string into k segments, each with only one sort of symbols, so as to minimize the total number of exceptions. Motivations come from machine learning and data mining. For binary strings we develop a linear-time algorithm for any k. Key to efficiency is a special-purpose data structure, called W-tree, which reflects relations between repetition lengths of symbols. Existence of algorithms faster than obvious dynamic programming remains open for non-binary strings. Our problem is also equivalent to finding weighted independent sets of prescribed size in paths. We show that this problem in bounded-degree graphs is FPT.
weighted independent set