skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Anomaly detection enhanced classification in computer intrusion detection

Conference ·

This report describes work with the goal of enhancing capabilities in computer intrusion detection. The work builds upon a study of classification performance, that compared various methods of classifying information derived from computer network packets into attack versus normal categories, based on a labeled training dataset. This previous work validates our classification methods, and clears the ground for studying whether and how anomaly detection can be used to enhance this performance, The DARPA project that initiated the dataset used here concluded that anomaly detection should be examined to boost the performance of machine learning in the computer intrusion detection task. This report investigates the data set for aspects that will be valuable for anomaly detection application, and supports these results with models constructed from the data. In this report, the term anomaly detection means learning a model from unlabeled data, and using this to make some inference about future data. Our data is a feature vector derived from network packets: an 'example' or 'sample'. On the other hand, classification means building a model from labeled data, and using that model to classify unlabeled (future) examples. There is some precedent in the literature for combining these methods. One approach is to stage the two techniques, using anomaly detection to segment data into two sets for classification. An interpretation of this is a method to combat nonstationarity in the data. In our previous work, we demonstrated that the data has substantial temporal nonstationarity. With classification methods that can be thought of as learning a decision surface between two statistical distributions, performance is expected to degrade significantly when classifying examples that are from regions not well represented in the training set. Anomaly detection can be seen as a problem of learning the density (landscape) or the support (boundary) of a statistical distribution so that, this characterization can be compared to data points. Nonstationarity can then be thought of as data that departs from the support of the distribution. Since we can judge that these 'anomalous' examples will be classified poorly, we can treat them difFereritly (or not at all). A second approach uses momaly detection with an assumption that any examples that are different are suspicious, which is an assumption that may or may not be true in an application. We will call this the Outlier Assumption. With this assumption there are simply the performance gains to be had from combining models that have uncorrelated errors into an ensemble with better performance than any of the individual models. This family of techniques has many names, including model averaging, multiple regression, and the very popular boosting approaches. In this approach the two methods are 'peer' results, which are then combined to generate a final result. Staged anomaly detection with the outlier assumption can also be used to create data sub-categories into which the classification method is specifically tuned, or vice-versa. This is an avenue for further work in this application area, and will not be demonstrated in this study. As in our previous work, this report does not attempt to address issues in dataset generation or feature selection. The details of the network and data collection process as well as the way in which this 'raw data' is transformed into well-defined feature vectors is a very important problem. However that exploration is beyond the scope of this effort.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE
OSTI ID:
976098
Report Number(s):
LA-UR-02-1148; TRN: US201009%%538
Resource Relation:
Journal Volume: 2388; Conference: "Submitted to: International Conference on Pattern Recognition, August 2002, Montreal Canada."
Country of Publication:
United States
Language:
English