National Computational Infrastructure for Lattice Gauge Theory SciDAC-2 Closeout Report
As part of this project work, researchers from Vanderbilt University, Fermi National Laboratory and Illinois Institute of technology developed a real-time cluster fault-tolerant cluster monitoring framework. This framework is open source and is available for download upon request. This work has also been used at Fermi Laboratory, Vanderbilt University and Mississippi State University across projects other than LQCD. The goal for the scientific workflow project is to investigate and develop domain-specific workflow tools for LQCD to help effectively orchestrate, in parallel, computational campaigns consisting of many loosely-coupled batch processing jobs. Major requirements for an LQCD workflow system include: a system to manage input metadata, e.g. physics parameters such as masses, a system to manage and permit the reuse of templates describing workflows, a system to capture data provenance information, a systems to manage produced data, a means of monitoring workflow progress and status, a means of resuming or extending a stopped workflow, fault tolerance features to enhance the reliability of running workflows. Requirements for an LQCD workflow system are available in documentation.
- Publication Date:
- OSTI Identifier:
- Report Number(s):
- DOE Contract Number:
- Resource Type:
- Technical Report
- Research Org:
- Illinois Institute of Technology
- Sponsoring Org:
- USDOE; USDOE SC Office of High Energy Physics (SC-25)
- Country of Publication:
- United States
- 72 PHYSICS OF ELEMENTARY PARTICLES AND FIELDS; 97 MATHEMATICS AND COMPUTING
Enter terms in the toolbar above to search the full text of this document for pages containing specific keywords.