On Collection of Large-Scale Multi-Purpose Datasets on Internet Backbone Links
Paper in proceedings, 2011

We have collected several large-scale datasets in a number of passive measurement projects on an Internet backbone link belonging to a national university network. The datasets have been used in different studies such as in general classification and characterization of properties of Internet traffic, in network security projects detecting and classifying malicious traffic and hosts, and in studies of network-level properties of unsolicited e-mail (spam) traffic. The Antispam dataset alone contains traffic between more than 10 million e-mail addresses. In this paper we describe our datasets, the data collection methodology including experiences in collecting and processing data on a large scale. We have in particular selected a dataset belonging to an anti-spam project to show how a practical analysis of highly privacy-sensitive data can be done, in this case containing complete e-mail traffic. Not only do we show that it is possible to collect large datasets, we also show how to solve different issues regarding user privacy and give experiences from how to work with large datasets.

Internet Measurement

Spam

E-mail traffic

Large-Scale Datasets

Author

[Person 05b77396-7830-4ef5-b1f9-63283959386e not found]

Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)

[Person b37190ec-7553-4f1b-bdbb-7b2f340ce4a9 not found]

Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)

[Person 72c5e347-5c1c-4e0b-8c77-b5a908779d81 not found]

Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)

[Person db569882-fbe4-4529-9e1c-ef95734f6271 not found]

Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)

[Person d5465fbf-0d33-4bc7-90e1-e67730748ea9 not found]

Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)

The First Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, BADGERS 2011, Salzburg, Austria, 10 April 2011

62-69

Subject Categories

Computer Engineering

Computer Science

Areas of Advance

Information and Communication Technology

DOI

10.1145/1978672.1978680

ISBN

978-145030768-0

More information

Created

10/7/2017