Summit Darshan Archival Dataset

Karimi, Ahmad Maroof; Khan, Awais; Oral, Sarp; Zimmer, Christopher

doi:10.13139/OLCF/2305496

Title: Summit Darshan Archival Dataset

Dataset
Other Related Research

Abstract

Summit Darshan Archival Dataset contains 2021 Summit Darshan log data for 25 applications and is grouped into science domains. The dataset is processed, and all the propriety fields are anonymized. The resultant data is converted into a tabular structure and saved in parquet file format. In this notebook, we demonstrate how to access the data. Data Organization: The data is organized into two directories: Darshan total (`darshan_total`): List all the high levels generated by the `darshan-parser --total` command on `.darshan` files. There is one parquet file for each application. Note: `uid` and `exe` field are masked Darshan detail (`darshan_detail`): This data contains detailed job level log information extracted by command `darshan-parser` on the raw `.darshan` files. The data is sorted by directory hierarchy in the order of `year/month/day (2021/12/07)`. For instance, to get the data for a `job_id` 3819766 of application `App11`, which was executed on `2021-12-07`can be accessed as follows. Note:`uid` and `filename` fields are masked

Authors:

Karimi, Ahmad Maroof; Khan, Awais; Oral, Sarp; Zimmer, Christopher

ORNL-OLCF

Publication Date:: Thu Feb 15 04:00:00 UTC 2024

DOE Contract Number:: AC05-00OR22725

Research Org.:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Org.:: Oak Ridge Leadership Computing Facility, Oak Ridge National Laboratory

Subject:: 97 MATHEMATICS AND COMPUTING; Darshan, Summit storage system

OSTI Identifier:: 2305496

DOI:: https://doi.org/10.13139/OLCF/2305496

Citation Formats


                    Karimi, Ahmad Maroof, Khan, Awais, Oral, Sarp, and Zimmer, Christopher. Summit Darshan Archival Dataset.  United States: N. p., 2024. 
        Web.  doi:10.13139/OLCF/2305496.

Copy to clipboard


                    Karimi, Ahmad Maroof, Khan, Awais, Oral, Sarp, & Zimmer, Christopher. Summit Darshan Archival Dataset.  United States.  doi:https://doi.org/10.13139/OLCF/2305496

Copy to clipboard


                    Karimi, Ahmad Maroof, Khan, Awais, Oral, Sarp, and Zimmer, Christopher. 2024.  
"Summit Darshan Archival Dataset".  United States.  doi:https://doi.org/10.13139/OLCF/2305496.  https://www.osti.gov/servlets/purl/2305496. Pub date:Thu Feb 15 04:00:00 UTC 2024

Copy to clipboard


                    
@article{osti_2305496,

  title        = {Summit Darshan Archival Dataset},

  author       = {Karimi, Ahmad Maroof and Khan, Awais and Oral, Sarp and Zimmer, Christopher},

  abstractNote = {Summit Darshan Archival Dataset contains 2021 Summit Darshan log data for 25 applications and is grouped into science domains. The dataset is processed, and all the propriety fields are anonymized. The resultant data is converted into a tabular structure and saved in parquet file format. In this notebook, we demonstrate how to access the data. Data Organization: The data is organized into two directories: Darshan total (`darshan_total`): List all the high levels generated by the `darshan-parser --total` command on `.darshan` files. There is one parquet file for each application. Note: `uid` and `exe` field are masked Darshan detail (`darshan_detail`): This data contains detailed job level log information extracted by command `darshan-parser` on the raw `.darshan` files. The data is sorted by directory hierarchy in the order of `year/month/day (2021/12/07)`. For instance, to get the data for a `job_id` 3819766 of application `App11`, which was executed on `2021-12-07`can be accessed as follows. Note:`uid` and `filename` fields are masked},

  doi          = {10.13139/OLCF/2305496},

  journal      = {},

  number       = ,

  volume       = ,

  place        = {United States},

  year         = {Thu Feb 15 04:00:00 UTC 2024},

  month        = {Thu Feb 15 04:00:00 UTC 2024}

}

Copy to clipboard

Dataset:

View Dataset

DOI: https://doi.org/10.13139/OLCF/2305496

Save / Share:

Export Metadata

Save to My Library

Similar records in DOE Data Explorer and OSTI.GOV collections:

April 2020 Darshan counters from the Summit supercomputer

Dataset Karimi, Ahmad Maroof ; Xie, Bing ; Paul, Arnab K. ; ...

This dataset is the Darshan counters collected from the Summit supercomputer in a month of April 2020. 1. Description of methods used for collection/generation of data: Job submitted on Summit HPC system when completed successfully and has made I/O calls (captured by Darshan tool) writes a Darshan log file on alpine filesystem. One job can have multiple `jsrun` commands and Darshan will generate separate logs each log corresponding to an `jsrun` command, so a job can have one or more Darshan logs associated with it. 2. Methods for processing the data: To process the data, we first use `darshan-util` toolmore »« less
Rittenhouse et al. 1960s Historical Archived Produced Water Dataset

Dataset Siefert, Nicholas

Around 1960, a private company entered into a research agreement to analyze for dissolved solids in water produced from >800 fields in the U.S. and Canada.. The elemental compositions provided were measured spectrochemically by Rittenhouse et al. The information has been made available to the public as a public service, but the names off of the companies and exact well locations have been removed. The samples are now located at the University of Texas in Austin. More details can be found in the following reference: Gordon Rittenhouse, Robert B. Fulton, Robert J. Grabowski, Joseph L. Bernard, Minor elements in oil-fieldmore »« less
Dataset: NewBio_SwgMxgWillowOnly

Dataset Davis, Maggie
OpenMDlr Dataset

Dataset Davidson, Russell B ; Sedova, Ada A

Protein structure dataset created with OpenMDlr. Restraints applied to fold proteins into native-like conformations were obtained from a variety of methods; results from these different restraint sets are contained in respective subdirectories.
Bowtie Dataset

Dataset Jones, William M. ; Debardeleben, Nathan A.

This dataset contains over 3,600 image files of images of a semiconductor manufactured part called 'Bowtie'. The data is grouped into 'accept' and 'reject' but does not contain masks for why the inspector rejected a part. There are further groupings such as different zoom magnifications, types of rejection (e.g. gouge, debris, etc.). Some of the parts have been laminated and are organized as such. For the 2nd round of data also included are Excel spreadsheets which can be used to identify position on the wafer where the images came from. The included PDF has example images and explains this inmore » more detail.« less

Similar Records