Summit Darshan Archival Dataset
Abstract
Summit Darshan Archival Dataset contains 2021 Summit Darshan log data for 25 applications and is grouped into science domains. The dataset is processed, and all the propriety fields are anonymized. The resultant data is converted into a tabular structure and saved in parquet file format. In this notebook, we demonstrate how to access the data. Data Organization: The data is organized into two directories: Darshan total (`darshan_total`): List all the high levels generated by the `darshan-parser --total` command on `.darshan` files. There is one parquet file for each application. Note: `uid` and `exe` field are masked Darshan detail (`darshan_detail`): This data contains detailed job level log information extracted by command `darshan-parser` on the raw `.darshan` files. The data is sorted by directory hierarchy in the order of `year/month/day (2021/12/07)`. For instance, to get the data for a `job_id` 3819766 of application `App11`, which was executed on `2021-12-07`can be accessed as follows. Note:`uid` and `filename` fields are masked
- Authors:
-
- ORNL-OLCF
- Publication Date:
- DOE Contract Number:
- AC05-00OR22725
- Research Org.:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Org.:
- Oak Ridge Leadership Computing Facility, Oak Ridge National Laboratory
- Subject:
- 97 MATHEMATICS AND COMPUTING; Darshan, Summit storage system
- OSTI Identifier:
- 2305496
- DOI:
- https://doi.org/10.13139/OLCF/2305496
Citation Formats
Karimi, Ahmad Maroof, Khan, Awais, Oral, Sarp, and Zimmer, Christopher. Summit Darshan Archival Dataset. United States: N. p., 2024.
Web. doi:10.13139/OLCF/2305496.
Karimi, Ahmad Maroof, Khan, Awais, Oral, Sarp, & Zimmer, Christopher. Summit Darshan Archival Dataset. United States. doi:https://doi.org/10.13139/OLCF/2305496
Karimi, Ahmad Maroof, Khan, Awais, Oral, Sarp, and Zimmer, Christopher. 2024.
"Summit Darshan Archival Dataset". United States. doi:https://doi.org/10.13139/OLCF/2305496. https://www.osti.gov/servlets/purl/2305496. Pub date:Thu Feb 15 04:00:00 UTC 2024
@article{osti_2305496,
title = {Summit Darshan Archival Dataset},
author = {Karimi, Ahmad Maroof and Khan, Awais and Oral, Sarp and Zimmer, Christopher},
abstractNote = {Summit Darshan Archival Dataset contains 2021 Summit Darshan log data for 25 applications and is grouped into science domains. The dataset is processed, and all the propriety fields are anonymized. The resultant data is converted into a tabular structure and saved in parquet file format. In this notebook, we demonstrate how to access the data. Data Organization: The data is organized into two directories: Darshan total (`darshan_total`): List all the high levels generated by the `darshan-parser --total` command on `.darshan` files. There is one parquet file for each application. Note: `uid` and `exe` field are masked Darshan detail (`darshan_detail`): This data contains detailed job level log information extracted by command `darshan-parser` on the raw `.darshan` files. The data is sorted by directory hierarchy in the order of `year/month/day (2021/12/07)`. For instance, to get the data for a `job_id` 3819766 of application `App11`, which was executed on `2021-12-07`can be accessed as follows. Note:`uid` and `filename` fields are masked},
doi = {10.13139/OLCF/2305496},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu Feb 15 04:00:00 UTC 2024},
month = {Thu Feb 15 04:00:00 UTC 2024}
}
