Mining transactional tree databases under homeomorphism
Artikel i vetenskaplig tidskrift, 2025

A key task in mining tree-structured data is finding frequent embedded tree patterns, which has two settings: the transactional setting and the per-occurrence setting. In the transactional setting, which is the focus of this paper, the crucial step is to decide whether a tree pattern is subtree homeomorphic to a database tree. Our extensive study on the properties of real-world tree-structured datasets reveals that while many vertices in a database tree may have the same label, no two vertices on the same path are identically labeled. In this paper, we exploit this property and propose a novel and efficient method for deciding whether a tree pattern is subtree homeomorphic to a database tree. Our algorithm is based on a compact data structure called EMET, which stores all information required for subtree homeomorphism. We propose an efficient algorithm to generate EMETs of larger patterns using EMETs of the smaller ones. Based on the proposed subtree homeomorphism method, we introduce TTM, an effective algorithm for finding frequent tree patterns from rooted ordered trees. We evaluate the efficiency of TTM on several real-world and synthetic datasets and show that it outperforms well-known existing algorithms by an order of magnitude.

Frequent tree patterns

XML documents

Rooted ordered trees

User web log data

Transactional tree mining

Subtree homeomorphism

Författare

Mostafa Haghir Chehreghani

Amirkabir University of Technology

Morteza Haghir Chehreghani

Data Science och AI 2

Journal of Supercomputing

0920-8542 (ISSN) 1573-0484 (eISSN)

Vol. 81 4 530

Ämneskategorier (SSIF 2025)

Datavetenskap (datalogi)

Diskret matematik

DOI

10.1007/s11227-025-06997-2

Mer information

Senast uppdaterat

2025-03-06