Homogeneous string segmentation using trees and weighted independent sets
Artikel i vetenskaplig tidskrift, 2010

We divide a string into k segments, each with only one sort of symbols, so as to minimize the total number of exceptions. Motivations come from machine learning and data mining. For binary strings we develop a linear-time algorithm for any k. Key to efficiency is a special-purpose data structure, called W-tree, which reflects relations between repetition lengths of symbols. For non-binary strings we give a nontrivial dynamic programming algorithm. Our problem is equivalent to finding weighted independent sets with certain size constraints, either in paths (binary case) or special interval graphs (general case). We also show that this problem is FPT in bounded-degree graphs.

parameterized complexity

interval graphs

segmentation

weighted independent set

tree computations

dynamic programming

Författare

Peter Damaschke

Chalmers, Data- och informationsteknik, Datavetenskap

Algorithmica

0178-4617 (ISSN) 1432-0541 (eISSN)

Vol. 57 621-640

Ämneskategorier

Datavetenskap (datalogi)

DOI

10.1007/s00453-008-9225-8