ProvSec: Open Cybersecurity System Provenance Analysis Benchmark Dataset with Labels
- Univ. of Central Oklahoma, Edmond, OK (United States)
- Sandia National Laboratories (SNL-CA), Livermore, CA (United States)
System provenance forensic analysis has been studied by a large body of research work. This area needs fine granularity data such as system calls along with event fields to track the dependencies of events. While prior work on security datasets has been proposed, we found a useful dataset of realistic attacks and details that are needed for high-quality provenance tracking is lacking. We created a new dataset of eleven vulnerable cases for system forensic analysis. It includes the full details of system calls including syscall parameters. Realistic attack scenarios with real software vulnerabilities and exploits are used. For each case, we created two sets of benign and adversary scenarios which are manually labeled for supervised machine-learning analysis. In addition, we present an algorithm to improve the data quality in the system provenance forensic analysis. We demonstrate the details of the dataset events and dependency analysis of our dataset cases.
- Research Organization:
- Sandia National Laboratories (SNL-CA), Livermore, CA (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA); US Department of Homeland Security (DHS)
- Grant/Contract Number:
- NA0003525
- OSTI ID:
- 2311396
- Report Number(s):
- SAND--2023-13731J
- Journal Information:
- International Journal of Networked and Distributed Computing, Journal Name: International Journal of Networked and Distributed Computing Journal Issue: 2 Vol. 11; ISSN 2211-7938
- Publisher:
- International Association of Computer and Information Science (ACIS)Copyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Cyber Attack Sequences Generation for Electric Power Grid
Trends in Cybersecurity Threats to Clean Energy