National Computational Infrastructure for Lattice Gauge Theory SciDAC-2 Closeout Report
As part of this project work, researchers from Vanderbilt University, Fermi National Laboratory and Illinois Institute of technology developed a real-time cluster fault-tolerant cluster monitoring framework. This framework is open source and is available for download upon request. This work has also been used at Fermi Laboratory, Vanderbilt University and Mississippi State University across projects other than LQCD. The goal for the scientific workflow project is to investigate and develop domain-specific workflow tools for LQCD to help effectively orchestrate, in parallel, computational campaigns consisting of many loosely-coupled batch processing jobs. Major requirements for an LQCD workflow system include: a system to manage input metadata, e.g. physics parameters such as masses, a system to manage and permit the reuse of templates describing workflows, a system to capture data provenance information, a systems to manage produced data, a means of monitoring workflow progress and status, a means of resuming or extending a stopped workflow, fault tolerance features to enhance the reliability of running workflows. Requirements for an LQCD workflow system are available in documentation.
- Research Organization:
- Illinois Institute of Technology, Chicago, IL (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), High Energy Physics (HEP)
- DOE Contract Number:
- FC02-06ER41442
- OSTI ID:
- 1089000
- Report Number(s):
- Final
- Country of Publication:
- United States
- Language:
- English
Similar Records
SDN for End-to-end Networked Science at the Exascale (SENSE) - Final Technical Report
SciDAC-Center for Plasma Edge Simulation Report