skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Production Experiences with the Cray-Enabled TORQUE Resource Manager

Abstract

High performance computing resources utilize batch systems to manage the user workload. Cray systems are uniquely different from typical clusters due to Cray s Application Level Placement Scheduler (ALPS). ALPS manages binary transfer, job launch and monitoring, and error handling. Batch systems require special support to integrate with ALPS using an XML protocol called BASIL. Previous versions of Adaptive Computing s TORQUE and Moab batch suite integrated with ALPS from within Moab, using PERL scripts to interface with BASIL. This would occasionally lead to problems when all the components would become unsynchronized. Version 4.1 of the TORQUE Resource Manager introduced new features that allow it to directly integrate with ALPS using BASIL. This paper describes production experiences at Oak Ridge National Laboratory using the new TORQUE software versions, as well as ongoing and future work to improve TORQUE.

Authors:
 [1];  [1];  [2]
  1. ORNL
  2. Adaptive Computing
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Center for Computational Sciences
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1086656
DOE Contract Number:  
DE-AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: Cray User Group, Napa Valley, CA, USA, 20130506, 20130509
Country of Publication:
United States
Language:
English
Subject:
Resource Manager; Adaptive Comput- ing; Cray; ALPS; Moab; HPC; Titan; Gaea

Citation Formats

Ezell, Matthew A, Maxwell, Don E, and Beer, David. Production Experiences with the Cray-Enabled TORQUE Resource Manager. United States: N. p., 2013. Web.
Ezell, Matthew A, Maxwell, Don E, & Beer, David. Production Experiences with the Cray-Enabled TORQUE Resource Manager. United States.
Ezell, Matthew A, Maxwell, Don E, and Beer, David. Tue . "Production Experiences with the Cray-Enabled TORQUE Resource Manager". United States.
@article{osti_1086656,
title = {Production Experiences with the Cray-Enabled TORQUE Resource Manager},
author = {Ezell, Matthew A and Maxwell, Don E and Beer, David},
abstractNote = {High performance computing resources utilize batch systems to manage the user workload. Cray systems are uniquely different from typical clusters due to Cray s Application Level Placement Scheduler (ALPS). ALPS manages binary transfer, job launch and monitoring, and error handling. Batch systems require special support to integrate with ALPS using an XML protocol called BASIL. Previous versions of Adaptive Computing s TORQUE and Moab batch suite integrated with ALPS from within Moab, using PERL scripts to interface with BASIL. This would occasionally lead to problems when all the components would become unsynchronized. Version 4.1 of the TORQUE Resource Manager introduced new features that allow it to directly integrate with ALPS using BASIL. This paper describes production experiences at Oak Ridge National Laboratory using the new TORQUE software versions, as well as ongoing and future work to improve TORQUE.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Jan 01 00:00:00 EST 2013},
month = {Tue Jan 01 00:00:00 EST 2013}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: