Non-uniform Sampling Methods for Large Itemset Mining
Paper i proceeding, 2023

A well-studied problem in data mining is large itemset mining. To address this problem over very large datasets, several approximate algorithms have been introduced, where an important class of such methods relies on sampling. However in the literature, only methods that are based on uniform sampling are investigated. In this paper, first we discuss how different sampling methods can be described using a generic sampling algorithm and study a property desirable for sampling methods. Then we use this property to argue that some non-uniform sampling methods may work better. We accordingly propose methods that sample each transaction proportional to its number of items or proportional to its number of frequent items. Finally, by conducting extensive experiments over real-world datasets, we show that non-uniform sampling methods usually outperform the uniform method.

Large itemset mining

maximal itemsets

closed itemsets

non-uniform sampling

approximate algorithms

uniform sampling

Författare

Zahra Moteshaker Arani

Amirkabir University of Technology

Mostafa Haghir Chehreghani

Amirkabir University of Technology

Morteza Haghir Chehreghani

Chalmers, Data- och informationsteknik, Data Science och AI

Proceedings - 2023 IEEE International Conference on Big Data, BigData 2023

5714-5722
9798350324457 (ISBN)

2023 IEEE International Conference on Big Data, BigData 2023
Sorrento, Italy,

Ämneskategorier

Data- och informationsvetenskap

DOI

10.1109/BigData59044.2023.10386750

Mer information

Senast uppdaterat

2024-02-26