Non-uniform Sampling Methods for Large Itemset Mining
Paper in proceeding, 2023

A well-studied problem in data mining is large itemset mining. To address this problem over very large datasets, several approximate algorithms have been introduced, where an important class of such methods relies on sampling. However in the literature, only methods that are based on uniform sampling are investigated. In this paper, first we discuss how different sampling methods can be described using a generic sampling algorithm and study a property desirable for sampling methods. Then we use this property to argue that some non-uniform sampling methods may work better. We accordingly propose methods that sample each transaction proportional to its number of items or proportional to its number of frequent items. Finally, by conducting extensive experiments over real-world datasets, we show that non-uniform sampling methods usually outperform the uniform method.

Large itemset mining

maximal itemsets

closed itemsets

non-uniform sampling

approximate algorithms

uniform sampling

Author

Zahra Moteshaker Arani

Amirkabir University of Technology

Mostafa Haghir Chehreghani

Amirkabir University of Technology

Morteza Haghir Chehreghani

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

Proceedings - 2023 IEEE International Conference on Big Data, BigData 2023

5714-5722
9798350324457 (ISBN)

2023 IEEE International Conference on Big Data, BigData 2023
Sorrento, Italy,

Subject Categories (SSIF 2011)

Computer and Information Science

DOI

10.1109/BigData59044.2023.10386750

More information

Latest update

2/26/2024