Query-Efficient Correlation Clustering via Active Learning of Pairwise Similarities
Licentiatavhandling, 2024

Clustering is an important unsupervised learning problem used to group objects based on their characteristics. Correlation clustering is arguably the most natural form of clustering because it only assumes access to a pairwise similarity measure between objects, where the similarities can be expressed as any positive or negative real number. This makes correlation clustering widely applicable to many problems, even when high quality feature vectors are not available. However, obtaining pairwise similarities between all objects may be expensive and impractical in many contexts. Motivated by this, we study the problem of finding high quality correlation clustering solutions with a constrained budget of queries for pairwise similarities. Acquiring the most informative data within a constrained budget is generally studied in the field of active learning. Therefore, we develop a generic pool-based batch active learning procedure with the purpose of performing query-efficient correlation clustering, which is highly robust to noisy oracle feedback.

batch active learning

Active learning

correlation clustering

noisy active learning

acquisition function

active clustering

Room Analysen, EDIT Building, Rännvägen 6.
Opponent: Associate Professor Ali Ramezani-Kebrya, University of Oslo, Norway

Författare

Linus Aronsson

Chalmers, Data- och informationsteknik, Data Science och AI

Correlation Clustering with Active Learning of Pairwise Similarities

Transactions on Machine Learning Research,;(2024)

Artikel i vetenskaplig tidskrift

Aronsson, L, Chehreghani, M. H. Information-Theoretic Active Correlation Clustering

Ämneskategorier

Annan data- och informationsvetenskap

Datavetenskap (datalogi)

Utgivare

Chalmers

Room Analysen, EDIT Building, Rännvägen 6.

Online

Opponent: Associate Professor Ali Ramezani-Kebrya, University of Oslo, Norway

Mer information

Senast uppdaterat

2024-08-08