Entity disambiguation in anonymized graphs using graph kernels
Paper in proceedings, 2013

This paper presents a novel method for entity disambiguation in anonymized graphs using local neighborhood structure. Most existing approaches leverage node information, which might not be available in several contexts due to privacy concerns, or information about the sources of the data. We consider this problem in the supervised setting where we are provided only with a base graph and a set of nodes labelled as ambiguous or unambiguous. We characterize the similarity between two nodes based on their local neighborhood structure using graph kernels; and solve the resulting classification task using SVMs. We give empirical evidence on two real-world datasets, comparing our approach to a state-of-the-art method, highlighting the advantages of our approach. We show that using less information, our method is significantly better in terms of either speed or accuracy or both. We also present extensions of two existing graphs kernels, namely, the direct product kernel and the shortest-path kernel, with significant improvements in accuracy. For the direct product kernel, our extension also provides significant computational benefits. Moreover, we design and implement the algorithms of our method to work in a distributed fashion using the GraphLab framework, ensuring high scalability.

Support vector machines

Graph kernels

Entity disambiguation

Entity resolution

Author

Linus Hermansson

Chalmers, Computer Science and Engineering (Chalmers), Computing Science (Chalmers)

Tommi Kerola

Chalmers, Computer Science and Engineering (Chalmers)

Fredrik Johansson

Chalmers, Computer Science and Engineering (Chalmers), Computing Science (Chalmers)

Vinay Jethava

Chalmers, Computer Science and Engineering (Chalmers), Computing Science (Chalmers)

Devdatt Dubhashi

Chalmers, Computer Science and Engineering (Chalmers), Computing Science (Chalmers)

22nd ACM International Conference on Information and Knowledge Management, CIKM 2013; San Francisco, CA; United States; 27 October 2013 through 1 November 2013

1037-1046

Subject Categories

Computer and Information Science

DOI

10.1145/2505515.2505565

ISBN

978-145032263-8

More information

Created

10/8/2017