skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Data Mining for Security Information: A Survey

Conference ·
OSTI ID:15005288

This paper will present a survey of the current published work and products available to do off-line data mining for computer network security information. Hundreds of megabytes of data are collected every second that are of interest to computer security professionals. This data can answer questions ranging from the proactive, ''Which machines are the attackers going to try to compromise?'' to the reactive, ''When did the intruder break into my system and how?'' Unfortunately, there's so much data that computer security professionals don't have time to sort through it all. What we need are systems that perform data mining at various levels on this corpus of data in order to ease the burden of the human analyst. Such systems typically operate on log data produced by hosts, firewalls and intrusion detection systems as such data is typically in a standard, machine readable format and usually provides information that is most relevant to the security of the system. Systems that do this type of data mining for security information fall under the classification of intrusion detection systems. It is important to point out that we are not surveying real-time intrusion detection systems. Instead, we examined what is possible when the analysis is done off-line. Doing the analysis off-line allows for a larger amount of data correlation between distant sites who transfer relevant log files periodically and may be able to take greater advantage of an archive of past logs. Such a system is not a replacement for a real-time intrusion detection system but should be used in conjunction with one. In fact, as noted previously, the logs of the real-time IDS may be one of the inputs to the data mining system. We will concentrate on the application of data mining to network connection data, as opposed to system logs or the output of real-time intrusion detection systems. We do this primarily because this data is readily obtained from firewalls or real-time intrusion detectors and it looks the same regardless of the network architecture or the systems that run on the network. This similarity greatly simplifies the data cleansing step and provides a dataset with high orthogonality between multiple sites, increasing the accuracy of the data mining operations. The decision to use connection logs instead of packet logs is discussed below. This paper will survey both the research that has been done in this area to date and publicly available products that perform such tasks.

Research Organization:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Organization:
US Department of Energy (US)
DOE Contract Number:
W-7405-ENG-48
OSTI ID:
15005288
Report Number(s):
UCRL-JC-143464; TRN: US200322%%355
Resource Relation:
Conference: 8th Association for Computing Machinery Conference on Computer & Communications Security, Philadelphia, PA (US), 11/06/2001--11/08/2001; Other Information: PBD: 19 Apr 2001
Country of Publication:
United States
Language:
English

Similar Records

Development and Demonstration of a Security Core Component
Technical Report · Fri Feb 28 00:00:00 EST 2014 · OSTI ID:15005288

High-end Home Firewalls CIAC-2326
Technical Report · Wed Oct 08 00:00:00 EDT 2003 · OSTI ID:15005288

Detecting and Blocking Network Attacks at Ultra High Speeds
Technical Report · Mon Nov 29 00:00:00 EST 2010 · OSTI ID:15005288