Anomaly detection enhanced classification in computer intrusion detection

Fugate, M L; Gattiker, J R

doi:10.1007/3-540-45665-1_15

Title: Anomaly detection enhanced classification in computer intrusion detection

Conference · Tue Jan 01 00:00:00 EST 2002

DOI:https://doi.org/10.1007/3-540-45665-1_15· OSTI ID:976098

Fugate, M L ^[1]; Gattiker, J R ^[2]

Michael L.
James R.

This report describes work with the goal of enhancing capabilities in computer intrusion detection. The work builds upon a study of classification performance, that compared various methods of classifying information derived from computer network packets into attack versus normal categories, based on a labeled training dataset. This previous work validates our classification methods, and clears the ground for studying whether and how anomaly detection can be used to enhance this performance, The DARPA project that initiated the dataset used here concluded that anomaly detection should be examined to boost the performance of machine learning in the computer intrusion detection task. This report investigates the data set for aspects that will be valuable for anomaly detection application, and supports these results with models constructed from the data. In this report, the term anomaly detection means learning a model from unlabeled data, and using this to make some inference about future data. Our data is a feature vector derived from network packets: an 'example' or 'sample'. On the other hand, classification means building a model from labeled data, and using that model to classify unlabeled (future) examples. There is some precedent in the literature for combining these methods. One approach is to stage the two techniques, using anomaly detection to segment data into two sets for classification. An interpretation of this is a method to combat nonstationarity in the data. In our previous work, we demonstrated that the data has substantial temporal nonstationarity. With classification methods that can be thought of as learning a decision surface between two statistical distributions, performance is expected to degrade significantly when classifying examples that are from regions not well represented in the training set. Anomaly detection can be seen as a problem of learning the density (landscape) or the support (boundary) of a statistical distribution so that, this characterization can be compared to data points. Nonstationarity can then be thought of as data that departs from the support of the distribution. Since we can judge that these 'anomalous' examples will be classified poorly, we can treat them difFereritly (or not at all). A second approach uses momaly detection with an assumption that any examples that are different are suspicious, which is an assumption that may or may not be true in an application. We will call this the Outlier Assumption. With this assumption there are simply the performance gains to be had from combining models that have uncorrelated errors into an ensemble with better performance than any of the individual models. This family of techniques has many names, including model averaging, multiple regression, and the very popular boosting approaches. In this approach the two methods are 'peer' results, which are then combined to generate a final result. Staged anomaly detection with the outlier assumption can also be used to create data sub-categories into which the classification method is specifically tuned, or vice-versa. This is an avenue for further work in this application area, and will not be demonstrated in this study. As in our previous work, this report does not attempt to address issues in dataset generation or feature selection. The details of the network and data collection process as well as the way in which this 'raw data' is transformed into well-defined feature vectors is a very important problem. However that exploration is beyond the scope of this effort.

View Conference

Cite

Export

Save

Research Organization:: Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)

Sponsoring Organization:: USDOE

OSTI ID:: 976098

Report Number(s):: LA-UR-02-1148; TRN: US201009%%538

Resource Relation:: Journal Volume: 2388; Conference: "Submitted to: International Conference on Pattern Recognition, August 2002, Montreal Canada."

Country of Publication:: United States

Language:: English

Similar Records

Semi-Supervised Domain Adaptation for Cross-Survey Galaxy Morphology Classification and Anomaly Detection

Conference · Tue Nov 01 00:00:00 EDT 2022 · OSTI ID:976098

Ćiprijanović, Aleksandra; Lewis, Ashia; Pedro, Kevin; +4 more

DeepAstroUDA: semi-supervised universal domain adaptation for cross-survey galaxy morphology classification and anomaly detection

Journal Article · Tue Apr 25 00:00:00 EDT 2023 · Machine Learning: Science and Technology · OSTI ID:976098

Ćiprijanović, A.; Lewis, A.; Pedro, K.; +4 more

Real-time Intrusion Detection for High-bandwidth Research Networks using Unsupervised Deep Learning

Conference · Wed Jan 01 00:00:00 EST 2020 · OSTI ID:976098

Gong, Qian

Related Subjects

99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE
CLASSIFICATION
COMPUTER NETWORKS
COMPUTERS
DETECTION
DISTRIBUTION
EXPLORATION
LEARNING
PATTERN RECOGNITION
PERFORMANCE
TRAINING
VECTORS

Title: Anomaly detection enhanced classification in computer intrusion detection

Citation Formats

Similar Records

Related Subjects