skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: The Importance of Interpretability in Cyber Security Analytics

Authors:
ORCiD logo [1]
  1. Los Alamos National Laboratory
Publication Date:
Research Org.:
Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1402573
Report Number(s):
LA-UR-17-29024
DOE Contract Number:
AC52-06NA25396
Resource Type:
Conference
Resource Relation:
Conference: UCSC CROSS Workshop ; 2017-10-04 - 2017-10-05 ; Santa Cruz, California, United States
Country of Publication:
United States
Language:
English

Citation Formats

Moore, Juston Shane. The Importance of Interpretability in Cyber Security Analytics. United States: N. p., 2017. Web.
Moore, Juston Shane. The Importance of Interpretability in Cyber Security Analytics. United States.
Moore, Juston Shane. 2017. "The Importance of Interpretability in Cyber Security Analytics". United States. doi:. https://www.osti.gov/servlets/purl/1402573.
@article{osti_1402573,
title = {The Importance of Interpretability in Cyber Security Analytics},
author = {Moore, Juston Shane},
abstractNote = {},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = 2017,
month =
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • Cyber analysts are tasked with the identification and mitigation of network exploits and threats. These compromises are difficult to identify due to the characteristics of cyber communication, the volume of traffic, and the duration of possible attack. It is necessary to have analytical tools to help analysts identify anomalies that span seconds, days, and weeks. Unfortunately, providing analytical tools effective access to the volumes of underlying data requires novel architectures, which is often overlooked in operational deployments. Our work is focused on a summary record of communication, called a flow. Flow records are intended to summarize a communication session betweenmore » a source and a destination, providing a level of aggregation from the base data. Despite this aggregation, many enterprise network perimeter sensors store millions of network flow records per day. The volume of data makes analytics difficult, requiring the development of new techniques to efficiently identify temporal patterns and potential threats. The massive volume makes analytics difficult, but there are other characteristics in the data which compound the problem. Within the billions of records of communication that transact, there are millions of distinct IP addresses involved. Characterizing patterns of entity behavior is very difficult with the vast number of entities that exist in the data. Research has struggled to validate a model for typical network behavior with hopes it will enable the identification of atypical behavior. Complicating matters more, typically analysts are only able to visualize and interact with fractions of data and have the potential to miss long term trends and behaviors. Our analysis approach focuses on aggregate views and visualization techniques to enable flexible and efficient data exploration as well as the capability to view trends over long periods of time. Realizing that interactively exploring summary data allowed analysts to effectively identify events, we utilized multidimensional OLAP data cubes. The data cube structure supports interactive analysis of summary data across multiple dimensions, such as location, time, and protocol. Cube technology also allows the analyst to drill-down into the underlying data set, when events of interest are identified and detailed analysis is required. Unfortunately, when creating these cubes, we ran into significant performance issues with our initial architecture, caused by a combination of the data volume and attribute characteristics. Overcoming, these issues required us to develop a novel, data intensive computing infrastructure. In particular, we ended up combining a Netezza Twin Fin data warehouse appliance, a solid state Fusion IO ioDrive, and the Tableau Desktop business intelligence analytic software. Using this architecture, we were able to analyze a month's worth of flow records comprising 4.9B records, totaling approximately 600GB of data. This paper describes our architecture, the challenges that we encountered, and the work that remains to deploy a fully generalized cyber analytical infrastructure.« less
  • Cyber analysts are tasked with the identification and mitigation of network exploits and threats. These compromises are difficult to identify due to the characteristics of cyber communication, the volume of traffic, and the duration of possible attack. In this paper, we describe a prototype implementation designed to provide cyber analysts an environment where they can interactively explore a month’s worth of cyber security data. This prototype utilized On-Line Analytical Processing (OLAP) techniques to present a data cube to the analysts. The cube provides a summary of the data, allowing trends to be easily identified as well as the ability tomore » easily pull up the original records comprising an event of interest. The cube was built using SQL Server Analysis Services (SSAS), with the interface to the cube provided by Tableau. This software infrastructure was supported by a novel hardware architecture comprising a Netezza TwinFin® for the underlying data warehouse and a cube server with a FusionIO drive hosting the data cube. We evaluated this environment on a month’s worth of artificial, but realistic, data using multiple queries provided by our cyber analysts. As our results indicate, OLAP technology has progressed to the point where it is in a unique position to provide novel insights to cyber analysts, as long as it is supported by an appropriate data intensive architecture.« less
  • Abstract not provided.
  • Abstract not provided.