skip to main content

Title: Comprehensive, Multi-Source Cyber-Security Events Data Set

This data set represents 58 consecutive days of de-identified event data collected from five sources within Los Alamos National Laboratory’s corporate, internal computer network. The data sources include Windows-based authentication events from both individual computers and centralized Active Directory domain controller servers; process start and stop events from individual Windows computers; Domain Name Service (DNS) lookups as collected on internal DNS servers; network flow data as collected on at several key router locations; and a set of well-defined red teaming events that present bad behavior within the 58 days. In total, the data set is approximately 12 gigabytes compressed across the five data elements and presents 1,648,275,307 events in total for 12,425 users, 17,684 computers, and 62,974 processes. Specific users that are well known system related (SYSTEM, Local Service) were not de-identified though any well-known administrators account were still de-identified. In the network flow data, well-known ports (e.g. 80, 443, etc) were not de-identified. All other users, computers, process, ports, times, and other details were de-identified as a unified set across all the data elements (e.g. U1 is the same U1 in all of the data). The specific timeframe used is not disclosed for security purposes. In addition, no datamore » that allows association outside of LANL’s network is included. All data starts with a time epoch of 1 using a time resolution of 1 second. In the authentication data, failed authentication events are only included for users that had a successful authentication event somewhere within the data set. « less
Authors:
 [1]
  1. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Publication Date:
Report Number(s):
LA-UR-15-23810
DOE Contract Number:
AC52-06NA25396
Product Type:
Dataset
Research Org(s):
Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Sponsoring Org:
USDOE Office of Science (SC)
Subject:
97 MATHEMATICS AND COMPUTING; Authentication
OSTI Identifier:
1179829
No associated Projects found.
No associated Collections found.
  1. To help guide its future data collection efforts, The DOE GTO funded a data gap analysis in FY2012 to identify high potential hydrothermal areas where critical data are needed. This analysis was updated in FY2013 and the resulting datasets are represented by this metadata. Themore » original process was published in FY 2012 and is available here: https://pangea.stanford.edu/ERE/db/GeoConf/papers/SGW/2013/Esposito.pdf Though there are many types of data that can be used for hydrothermal exploration, five types of exploration data were targeted for this analysis. These data types were selected for their regional reconnaissance potential, and include many of the primary exploration techniques currently used by the geothermal industry. The data types include: 1. well data 2. geologic maps 3. fault maps 4. geochemistry data 5. geophysical data To determine data coverage, metadata for exploration data (including data type, data status, and coverage information) were collected and catalogued from nodes on the National Geothermal Data System (NGDS). It is the intention of this analysis that the data be updated from this source in a semi-automated fashion as new datasets are added to the NGDS nodes. In addition to this upload, an online tool was developed to allow all geothermal data providers to access this assessment and to directly add metadata themselves and view the results of the analysis via maps of data coverage in Geothermal Prospector (http://maps.nrel.gov/gt_prospector). A grid of the contiguous U.S. was created with 88,000 10-km by 10-km grid cells, and each cell was populated with the status of data availability corresponding to the five data types. Using these five data coverage maps and the USGS Resource Potential Map, sites were identified for future data collection efforts. These sites signify both that the USGS has indicated high favorability of occurrence of geothermal resources and that data gaps exist. The uploaded data are contained in two data files for each data category. The first file contains the grid and is in the SHP file format (shape file.) Each populated grid cell represents a 10k area within which data is known to exist. The second file is a CSV (comma separated value) file that contains all of the individual layers that intersected with the grid. This CSV can be joined with the map to retrieve a list of datasets that are available at any given site. The attributes in the CSV include: 1. grid_id : The id of the grid cell that the data intersects with 2. title: This represents the name of the WFS service that intersected with this grid cell 3. abstract: This represents the description of the WFS service that intersected with this grid cell 4. gap_type: This represents the category of data availability that these data fall within. As the current processing is pulling data from NGDS, this category universally represents data that are available in the NGDS and are ready for acquisition for analytic purposes. 5. proprietary_type: Whether the data are considered proprietary 6. service_type: The type of service 7. base_url: The service URL « less
  2. AASG Wells Data for the EGS Test Site Planning and Analysis Task Temperature measurement data obtained from boreholes for the Association of American State Geologists (AASG) geothermal data project. Typically bottomhole temperatures are recorded from log headers, and this information is provided through a boreholemore » temperature observation service for each state. Service includes header records, well logs, temperature measurements, and other information for each borehole. Information presented in Geothermal Prospector was derived from data aggregated from the borehole temperature observations for all states. For each observation, the given well location was recorded and the best available well identified (name), temperature and depth were chosen. The “Well Name Source,” “Temp. Type” and “Depth Type” attributes indicate the field used from the original service. This data was then cleaned and converted to consistent units. The accuracy of the observation’s location, name, temperature or depth was note assessed beyond that originally provided by the service. - AASG bottom hole temperature datasets were downloaded from repository.usgin.org between the dates of May 16th and May 24th, 2013. - Datasets were cleaned to remove “null” and non-real entries, and data converted into consistent units across all datasets - Methodology for selecting ”best” temperature and depth attributes from column headers in AASG BHT Data sets: • Temperature: • CorrectedTemperature – best • MeasuredTemperature – next best • Depth: • DepthOfMeasurement – best • TrueVerticalDepth – next best • DrillerTotalDepth – last option • Well Name/Identifier • APINo – best • WellName – next best • ObservationURI - last option. The column headers are as follows: • gid = internal unique ID • src_state = the state from which the well was downloaded (note: the low temperature wells in Idaho are coded as “ID_LowTemp”, while all other wells are simply the two character state abbreviation) • source_url = the url for the source WFS service or Excel file • temp_c = “best” temperature in Celsius • temp_type = indicates whether temp_c comes from the corrected or measured temperature header column in the source document • depth_m = “best” depth in meters • depth_type = indicates whether depth_m comes from the measured, true vertical, or driller total depth header column in the source document • well_name = “best” well name or ID • name_src = indicates whether well_name came from apino, wellname, or observationuri header column in the source document • lat_wgs84 = latitude in wgs84 • lon_wgs84 = longitude in wgs84 • state = state in which the point is located • county = county in which the point is located « less
  3. The Engineered Geothermal System (EGS) Exploration Methodology Project is developing an exploration approach for EGS through the integration of geoscientific data. The Project chose the Dixie Valley Geothermal System in Nevada as a field laboratory site for methodology calibration purposes because, in the public domain,more » it is a highly characterized geothermal system in the Basin and Range with a considerable amount of geoscience and most importantly, well data. The overall project area is 2500km2 with the Calibration Area (Dixie Valley Geothermal Wellfield) being about 170km2. The project was subdivided into five tasks (1) collect and assess the existing public domain geoscience data; (2) design and populate a GIS database; (3) develop a baseline (existing data) geothermal conceptual model, evaluate geostatistical relationships, and generate baseline, coupled EGS favorability/trust maps from +1km above sea level (asl) to -4km asl for the Calibration Area at 0.5km intervals to identify EGS drilling targets at a scale of 5km x 5km; (4) collect new geophysical and geochemical data, and (5) repeat Task 3 for the enhanced (baseline + new ) data. Favorability maps were based on the integrated assessment of the three critical EGS exploration parameters of interest: rock type, temperature and stress. A complimentary trust map was generated to compliment the favorability maps to graphically illustrate the cumulative confidence in the data used in the favorability mapping. The Final Scientific Report (FSR) is submitted in two parts with Part I describing the results of project Tasks 1 through 3 and Part II covering the results of project Tasks 4 through 5 plus answering nine questions posed in the proposal for the overall project. FSR Part I presents (1) an assessment of the readily available public domain data and some proprietary data provided by Terra-Gen Power, LLC, (2) a re-interpretation of these data as required, (3) an exploratory geostatistical data analysis, (4) the baseline geothermal conceptual model, and (5) the EGS favorability/trust mapping. The conceptual model presented applies to both the hydrothermal system and EGS in the Dixie Valley region. FSR Part II presents (1) 278 new gravity stations; (2) enhanced gravity-magnetic modeling; (3) 42 new ambient seismic noise survey stations; (4) an integration of the new seismic noise data with a regional seismic network; (5) a new methodology and approach to interpret this data; (5) a novel method to predict rock type and temperature based on the newly interpreted data; (6) 70 new magnetotelluric (MT) stations; (7) an integrated interpretation of the enhanced MT data set; (8) the results of a 308 station soil CO2 gas survey; (9) new conductive thermal modeling in the project area; (10) new convective modeling in the Calibration Area; (11) pseudo-convective modeling in the Calibration Area; (12) enhanced data implications and qualitative geoscience correlations at three scales (a) Regional, (b) Project, and (c) Calibration Area; (13) quantitative geostatistical exploratory data analysis; and (14) responses to nine questions posed in the proposal for this investigation. Enhanced favorability/trust maps were not generated because there was not a sufficient amount of new, fully-vetted (see below) rock type, temperature, and stress data. The enhanced seismic data did generate a new method to infer rock type and temperature. However, in the opinion of the Principal Investigator for this project, this new methodology needs to be tested and evaluated at other sites in the Basin and Range before it is used to generate the referenced maps. As in the baseline conceptual model, the enhanced findings can be applied to both the hydrothermal system and EGS in the Dixie Valley region. « less
  4. ATP3 Unified Field Study DataThe Algae Testbed Public-Private Partnership ATP3 was established with the goal of investigating open pond algae cultivation across different geographic climatic seasonal and operational conditions while setting the benchmark for quality data collection analysis and dissemination. Identical algae cultivation systems andmore » data analysis methodologies were established at testbed sites across the continental United States and Hawaii. Within this framework the Unified Field Studies UFS were designed to characterize the cultivation of different algal strains during all 4 seasons across this testbed network. The dataset presented here is the complete curated climatic cultivation harvest and biomass composition data for each season at each site. These data enable others to do in-depth cultivation harvest techno-economic life cycle resource and predictive growth modeling analysis as well as develop crop protection strategies for the nascent algae industry.NREL Sub award Number DE-AC36-08-GO28308 « less
  5. This data set consists of bulk soil characteristics as well as carbon and nutrient mineralization rates of active layer soils manually collected from the field in August, 2012, frozen, and then thawed and incubated across a range of temperatures in the laboratory for 28 daymore » periods in 2013-2015. The soils were collected from four replicate polygons in each of the four Areas (A, B, C, and D) of Intensive Site 1 at the Next-Generation Ecosystem Experiments (NGEE) Arctic site near Barrow, Alaska. Soil samples were coincident with the established Vegetation Plots that are located in center, edge, and trough microtopography in each polygon. Data included are 1) bulk soil characteristics including carbon, nitrogen, gravimetric water content, bulk density, and pH in 5-cm depth increments and also by soil horizon, 2) carbon, nitrogen, and phosphorus mineralization rates for soil horizons incubated aerobically (and in one case both aerobically and anaerobically) for 28 days at temperatures that included 2, 4, 8, and 12 degrees C. Additional soil and incubation data are forthcoming. They will be available when published as part of another paper that includes additional replicate analyses. « less