Towards Understanding the Social Structure of Email and Spam Traffic
Licentiate thesis, 2012

Email is a pervasive means of communication on the Internet. Email exchanges between individuals can be seen as social interactions between email sender(s) and receiver(s), thus can be represented as a network. Networks of human interactions such as friendship relations, research collaborations, and phone calls have been widely studied before to allow understanding of the characteristics, as well as the structure and dynamics of such social interactions. In this thesis, we look into the social network properties of email networks generated from real traffic, and investigate how a vast amount of unsolicited email traffic (spam) affect these properties. Current advances in Internet data collection and processing has facilitated the study of the characteristics of email traffic observed on the Internet. In our study, we have collected large-scale email datasets from traffic traversing a high-speed Internet backbone link and have generated email networks from the observed communications to analyze the structure and dynamics of these social interactions. Moreover, we aim at unveiling the distinguishing characteristics of legitimate and unsolicited email communications. We show that the networks of legitimate email traffic has the same structural and temporal properties that other social networks exhibit, and therefore can be modeled as small-world scale-free networks. However, the unsolicited email communications cause deviations and anomalies in the structure of email networks, and this deviation from the expected social structural properties can be used to find the sources of spam email. We also show that email networks, similar to other social networks, have a community structure which can be found using different community detection algorithms. However, not all community detection algorithms can identify structural communities that coincide with the true logical communities of email networks, i.e., distinct communities of legitimate and unsolicited email. Our study shows that a link-based community detection algorithm is more suitable for this purpose than more widely used node-based algorithms. The possibility of merely using the social structure of email traffic to identify the source of spam and separate the unsolicited email from legitimate email, can potentially be used to improve the protection against spam and other types of malicious activities on the Internet.

Internet Backbone Traffic

Email Networks


Social Network Analysis

Community Detection

Anomaly Detection

HC3, Hörsalsvägen 14, Chalmers University of Technology
Opponent: Dr. Thomas Karagiannis, Microsoft Research, Cambridge


Farnaz Moradi

Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)

On Collection of Large-Scale Multi-Purpose Datasets on Internet Backbone Links

The First Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, BADGERS 2011, Salzburg, Austria, 10 April 2011,; (2011)p. 62-69

Paper in proceeding

An Evaluation of Community Detection Algorithms on Large-Scale Email Traffic

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),; Vol. 7276 LNCS(2012)p. 283-294

Paper in proceeding

Towards Modeling Legitimate and Unsolicited Email Traffic Using Social Network Properties

Proceedings of the Fifth Workshop on Social Network Systems, SNS'12. 5th Workshop on Social Network Systems, Bern,10 April 2012,; (2012)

Paper in proceeding

Subject Categories

Computer Science

Technical report L - Department of Computer Science and Engineering, Chalmers University of Technology and Göteborg University: 98L

HC3, Hörsalsvägen 14, Chalmers University of Technology

Opponent: Dr. Thomas Karagiannis, Microsoft Research, Cambridge

More information