Logging for Intrusion and Fraud Detection
Doctoral thesis, 2004
Computer security is an area of ever increasing importance. Our society relies on computerised services, which gives many reasons for computer criminals, attackers, terrorists, hackers, crackers, fraudsters, or whatever name is appropriate, to break these systems. To deal with security problems, many types of mechanisms have been developed.
One mechanism is the intrusion detection system (IDS), designed to detect ongoing attacks, detect attacks after the fact or even detect preparations for an attack. The IDS is complementary to preventive security mechanisms, such as firewalls and authentication systems, which can never be made 100% secure.
A similar type of system is the fraud detection system (FDS), specialised to detect frauds (or "attacks") in commercial services in different business areas, such as telecom, insurance and banking. Fraud detection can be considered a special case of intrusion detection.
A crucial part of intrusion or fraud detection is to have good quality input data for the analysis, as well as for training and testing the systems. However, it is difficult to acquire any training and test data and it is not known what kind of log data are most suitable to use for detection.
The contribution of this thesis is to offer guidance in matters of acquiring more suitable log data for intrusion and fraud detection. The first part is general and gives a survey of research done in intrusion detection and shows that intrusion and fraud detection reflect different aspects of the same problem.
The second part is devoted to improving the availability and quality of log data used in intrusion and fraud detection.
The availability of log data for training and testing detection systems can be improved by solving the privacy issues that prevent computer system owners from releasing their log data. Therefore, a method is suggested for anonymising the log data in a way that does not significantly affect their usefulness for detection.
Though authentic data are convenient to use for training and testing they do not always have the desirable properties, which include flexibility and control of content. Another contribution to improve the availability and also the quality of log data is thus a method for creating synthetic training and test data with suitable properties. This part also includes a methodology for determining exactly which log data can be used for detecting specific attacks. In the ideal situation, we only collect exactly the data needed for detection, and this methodology can help us develop more efficient and adapted log sources. These new log sources will improve the quality of log data used for intrusion and fraud detection.