Query-Efficient Correlation Clustering via Active Learning of Pairwise Similarities
Licentiate thesis, 2024

Clustering is an important unsupervised learning problem used to group objects based on their characteristics. Correlation clustering is arguably the most natural form of clustering because it only assumes access to a pairwise similarity measure between objects, where the similarities can be expressed as any positive or negative real number. This makes correlation clustering widely applicable to many problems, even when high quality feature vectors are not available. However, obtaining pairwise similarities between all objects may be expensive and impractical in many contexts. Motivated by this, we study the problem of finding high quality correlation clustering solutions with a constrained budget of queries for pairwise similarities. Acquiring the most informative data within a constrained budget is generally studied in the field of active learning. Therefore, we develop a generic pool-based batch active learning procedure with the purpose of performing query-efficient correlation clustering, which is highly robust to noisy oracle feedback.

batch active learning

Active learning

correlation clustering

noisy active learning

acquisition function

active clustering

Room Analysen, EDIT Building, Rännvägen 6.
Opponent: Associate Professor Ali Ramezani-Kebrya, University of Oslo, Norway

Author

Linus Aronsson

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

Correlation Clustering with Active Learning of Pairwise Similarities

Transactions on Machine Learning Research,;(2024)

Journal article

Aronsson, L, Chehreghani, M. H. Information-Theoretic Active Correlation Clustering

Subject Categories (SSIF 2011)

Other Computer and Information Science

Computer Science

Publisher

Chalmers

Room Analysen, EDIT Building, Rännvägen 6.

Online

Opponent: Associate Professor Ali Ramezani-Kebrya, University of Oslo, Norway

More information

Latest update

8/8/2024 1