Improving Community Detection Methods for Network Data Analysis
Doktorsavhandling, 2014

Empirical analysis of network data has been widely conducted for understanding and predicting the structure and function of real systems and identifying interesting patterns and anomalies. One of the most widely studied structural properties of networks is their community structure. In this thesis we investigate some of the challenges and applications of community detection for analysis of network data and propose different approaches for improving community detection methods. One of the challenges in using community detection for network data analysis is that there is no consensus on a definition for a community despite excessive studies which have been performed on the community structure of real networks. Therefore, evaluating the quality of the communities identified by different community detection algorithms is problematic. In this thesis, we perform an empirical comparison and evaluation of the quality of the communities identified by a variety of community detection algorithms which use different definitions for communities for different applications of network data analysis. Another challenge in using community detection for analysis of network data is the scalability of the existing algorithms. Parallelizing community detection algorithms is one way to improve the scalability of community detection. Local community detection algorithms are by nature suitable for parallelization. One of the most successful approaches to local community detection is local expansion of seed nodes into overlapping communities. However, the communities identified by a local algorithm might cover only a subset of the nodes in a network if the seeds are not selected carefully. The selection of good seeds that are well distributed over a network using only the local structure of a network is therefore crucial. In this thesis, we propose a novel local seeding algorithm, which is based on link prediction and graph coloring, for selecting good seeds for local community detection in large-scale networks. Overall, mining network data has many applications. The focus of this thesis is on analyzing network data obtained from backbone Internet traffic, social networks, and search query log files. We show that mining the structural and temporal properties of email networks generated from Internet backbone traffic can be used to identify unsolicited email from the mixture of email traffic. We also show that a link based community detection algorithm can separate legitimate and unsolicited email into distinct communities. Moreover, we show that, in contrast to previous studies, community detection algorithms can be used for network anomaly detection. We also propose a method for enhancing community detection algorithms and present a framework for using community detection as a basis for network misbehavior detection. Finally, we show that network analysis of query log files obtained from a health care portal can complement the existing methods for semantic analysis of health related queries.

Spam

Overlapping Communities

Networks

Misbehavior Detection

Seed Selection

Community Detection Algorithms

Medical Query Logs

HA3
Opponent: Professor Minos Garofalakis

Författare

Farnaz Moradi

Chalmers, Data- och informationsteknik, Nätverk och system

Overlapping Communities for Identifying Misbehavior in Network Communications

Proceedings of the 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’14),;Vol. 8443(2014)p. 398-409

Paper i proceeding

Towards Modeling Legitimate and Unsolicited Email Traffic Using Social Network Properties

Proceedings of the Fifth Workshop on Social Network Systems, SNS'12. 5th Workshop on Social Network Systems, Bern,10 April 2012,;(2012)

Paper i proceeding

A Local Seed Selection Algorithm for Overlapping Community Detection

The 2014 IEEE/ACM International Conference on Advances in Social Network Analysis and Mining (ASONAM),;(2014)p. 1-8

Paper i proceeding

An Evaluation of Community Detection Algorithms on Large-Scale Email Traffic

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),;Vol. 7276 LNCS(2012)p. 283-294

Paper i proceeding

A Graph-Based Analysis of Medical Queries of a Swedish Health Care Portal

Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis,;(2014)p. 2-10

Paper i proceeding

Ämneskategorier

Datavetenskap (datalogi)

ISBN

978-91-7597-041-7

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 3722

HA3

Opponent: Professor Minos Garofalakis

Mer information

Skapat

2017-10-08