Mining Network-Level Communication Patterns of Email Traffic for Spotting Unsolicited Email
In this paper we present an anomaly detection method that detects abnormal patterns in network-level communication patterns of email traffic. We show that these anomalies are caused by the excessive amount of unsolicited email (spam) traffic. The deviation in network-level behavior of spam from the normal communication patterns of legitimate email (ham) can be revealed by structural analysis of email networks generated from real email traffic. We derive a time series of distributions based on the structural properties of email networks and spot the anomalies by comparing the feature distributions of current email traffic against baseline distributions generated from legitimate email traffic. Our experimental results on SMTP traffic captured on an Internet backbone link show that the anomalous nodes identified by this approach correspond to the spam sending nodes in the network.