Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

ProvSec: Open Cybersecurity System Provenance Analysis Benchmark Dataset with Labels

Journal Article · · International Journal of Networked and Distributed Computing

System provenance forensic analysis has been studied by a large body of research work. This area needs fine granularity data such as system calls along with event fields to track the dependencies of events. While prior work on security datasets has been proposed, we found a useful dataset of realistic attacks and details that are needed for high-quality provenance tracking is lacking. We created a new dataset of eleven vulnerable cases for system forensic analysis. It includes the full details of system calls including syscall parameters. Realistic attack scenarios with real software vulnerabilities and exploits are used. For each case, we created two sets of benign and adversary scenarios which are manually labeled for supervised machine-learning analysis. In addition, we present an algorithm to improve the data quality in the system provenance forensic analysis. We demonstrate the details of the dataset events and dependency analysis of our dataset cases.

Research Organization:
Sandia National Laboratories (SNL-CA), Livermore, CA (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA); US Department of Homeland Security (DHS)
Grant/Contract Number:
NA0003525
OSTI ID:
2311396
Report Number(s):
SAND--2023-13731J
Journal Information:
International Journal of Networked and Distributed Computing, Journal Name: International Journal of Networked and Distributed Computing Journal Issue: 2 Vol. 11; ISSN 2211-7938
Publisher:
International Association of Computer and Information Science (ACIS)Copyright Statement
Country of Publication:
United States
Language:
English

References (33)

Some Software Vulnerability Real-World Data Sets dataset January 2021
Data Sources and Datasets for Cloud Intrusion Detection Modeling and Evaluation book January 2018
SPADE: Support for Provenance Auditing in Distributed Environments book January 2012
Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset journal November 2019
A detailed analysis of the KDD CUP 99 data set conference July 2009
Detecting Malware Injection with Program-DNS Behavior conference September 2020
HOLMES: Real-Time APT Detection through Correlation of Suspicious Information Flows conference May 2019
Combating Dependence Explosion in Forensic Analysis Using Alternative Tag Propagation Semantics conference May 2020
Tactical Provenance Analysis for Endpoint Detection and Response Systems conference May 2020
SoK: History is a Vast Early Warning System: Auditing the Provenance of System Intrusions conference May 2023
Kairos: Practical Intrusion Detection and Investigation using Whole-system Provenance conference May 2024
Camflow: Managed Data-Sharing for Cloud Services journal July 2017
LogGC conference January 2013
Accurate, Low Cost and Instrumentation-Free Security Audit Logging for Windows conference December 2015
High Fidelity Data Reduction for Big Data Security Dependency Analyses conference October 2016
Millions of targets under attack conference November 2017
VulinOSS conference May 2018
NodeMerge conference October 2018
This is Why We Can’t Cache Nice Things: Lightning-Fast Threat Hunting using Suspicion-Based Hierarchical Storage conference December 2020
On the Forensic Validity of Approximated Audit Logs conference December 2020
Analyzing the Usefulness of the DARPA OpTC Dataset in Cyber Threat Detection Research conference June 2021
Provenance-based Intrusion Detection Systems: A Survey journal December 2022
PalanTír conference November 2022
A Comprehensive Dataset Towards Hands-on Experience Enhancement in a Research-Involved Cybersecurity Program conference October 2023
Data Provenance in Security and Privacy journal July 2023
Backtracking intrusions conference October 2003
Detecting Malicious DNS over HTTPS Traffic in Domain Name System using Machine Learning Classifiers journal August 2020
ProTracer: Towards Practical Provenance Tracing by Alternating Between Logging and Tainting conference January 2016
Towards Scalable Cluster Auditing through Grammatical Inference over Provenance Graphs conference January 2018
Towards a Timely Causality Analysis for Enterprise Security conference January 2018
NoDoze: Combatting Threat Alert Fatigue with Automated Provenance Triage conference January 2019
You Are What You Do: Hunting Stealthy Malware via Data Provenance Analysis conference January 2020
Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization conference January 2018

Similar Records

Hybrid Attack Graph Generation with Graph Convolutional Deep-Q Learning
Conference · Sat Dec 30 23:00:00 EST 2023 · OSTI ID:2336568

Cyber Attack Sequences Generation for Electric Power Grid
Conference · Tue May 03 00:00:00 EDT 2022 · OSTI ID:1872531

Trends in Cybersecurity Threats to Clean Energy
Conference · Wed Jun 12 00:00:00 EDT 2024 · OSTI ID:2373116