Entity disambiguation in anonymized graphs using graph kernels
Paper i proceeding, 2013

This paper presents a novel method for entity disambiguation in anonymized graphs using local neighborhood structure. Most existing approaches leverage node information, which might not be available in several contexts due to privacy concerns, or information about the sources of the data. We consider this problem in the supervised setting where we are provided only with a base graph and a set of nodes labelled as ambiguous or unambiguous. We characterize the similarity between two nodes based on their local neighborhood structure using graph kernels; and solve the resulting classification task using SVMs. We give empirical evidence on two real-world datasets, comparing our approach to a state-of-the-art method, highlighting the advantages of our approach. We show that using less information, our method is significantly better in terms of either speed or accuracy or both. We also present extensions of two existing graphs kernels, namely, the direct product kernel and the shortest-path kernel, with significant improvements in accuracy. For the direct product kernel, our extension also provides significant computational benefits. Moreover, we design and implement the algorithms of our method to work in a distributed fashion using the GraphLab framework, ensuring high scalability.

Support vector machines

Graph kernels

Entity disambiguation

Entity resolution

Författare

Linus Hermansson

Chalmers, Data- och informationsteknik, Datavetenskap

Tommi Kerola

Chalmers, Data- och informationsteknik

Fredrik Johansson

Chalmers, Data- och informationsteknik, Datavetenskap

Vinay Jethava

Chalmers, Data- och informationsteknik, Datavetenskap

Devdatt Dubhashi

Chalmers, Data- och informationsteknik, Datavetenskap

22nd ACM International Conference on Information and Knowledge Management, CIKM 2013; San Francisco, CA; United States; 27 October 2013 through 1 November 2013

1037-1046
978-145032263-8 (ISBN)

Ämneskategorier

Data- och informationsvetenskap

DOI

10.1145/2505515.2505565

ISBN

978-145032263-8

Mer information

Skapat

2017-10-08