skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation

Abstract

Historically, high energy physics computing has been performed on large purpose-built computing systems. These began as single-site compute facilities, but have evolved into the distributed computing grids used today. Recently, there has been an exponential increase in the capacity and capability of commercial clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is a growing interest among the cloud providers to demonstrate the capability to perform large-scale scientific computing. In this paper, we discuss results from the CMS experiment using the Fermilab HEPCloud facility, which utilized both local Fermilab resources and virtual machines in the Amazon Web Services Elastic Compute Cloud. We discuss the planning, technical challenges, and lessons learned involved in performing physics workflows on a large-scale set of virtualized resources. Additionally, we will discuss the economics and operational efficiencies when executing workflows both in the cloud and on dedicated resources.

Authors:
ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [2]; ORCiD logo [1];  [3];  [1];  [1];  [4]; ORCiD logo [1]; ORCiD logo [1];  [1];  [1];  [1];  [1];  [1];  [1]; ORCiD logo [1]; ORCiD logo [1]
  1. Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
  2. Univ. of Nebraska, Lincoln, NE (United States)
  3. Simons Foundation, New York, NY (United States)
  4. European Organization for Nuclear Research (CERN), Geneva (Switzerland)
Publication Date:
Research Org.:
Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), High Energy Physics (HEP) (SC-25)
OSTI Identifier:
1418149
Report Number(s):
arXiv:1710.00100; FERMILAB-PUB-17-092-CD
Journal ID: ISSN 2510-2036; 1628463; TRN: US1801245
Grant/Contract Number:  
AC02-07CH11359
Resource Type:
Accepted Manuscript
Journal Name:
Computing and Software for Big Science
Additional Journal Information:
Journal Volume: 1; Journal Issue: 1; Journal ID: ISSN 2510-2036
Publisher:
Springer
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; 72 PHYSICS OF ELEMENTARY PARTICLES AND FIELDS; High energy physics; Computing; Cloud; Amazon web services

Citation Formats

Holzman, Burt, Bauerdick, Lothar A. T., Bockelman, Brian, Dykstra, Dave, Fisk, Ian, Fuess, Stuart, Garzoglio, Gabriele, Girone, Maria, Gutsche, Oliver, Hufnagel, Dirk, Kim, Hyunwoo, Kennedy, Robert, Magini, Nicolo, Mason, David, Spentzouris, Panagiotis, Tiradani, Anthony, Timm, Steve, and Vaandering, Eric W. HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation. United States: N. p., 2017. Web. doi:10.1007/s41781-017-0001-9.
Holzman, Burt, Bauerdick, Lothar A. T., Bockelman, Brian, Dykstra, Dave, Fisk, Ian, Fuess, Stuart, Garzoglio, Gabriele, Girone, Maria, Gutsche, Oliver, Hufnagel, Dirk, Kim, Hyunwoo, Kennedy, Robert, Magini, Nicolo, Mason, David, Spentzouris, Panagiotis, Tiradani, Anthony, Timm, Steve, & Vaandering, Eric W. HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation. United States. doi:10.1007/s41781-017-0001-9.
Holzman, Burt, Bauerdick, Lothar A. T., Bockelman, Brian, Dykstra, Dave, Fisk, Ian, Fuess, Stuart, Garzoglio, Gabriele, Girone, Maria, Gutsche, Oliver, Hufnagel, Dirk, Kim, Hyunwoo, Kennedy, Robert, Magini, Nicolo, Mason, David, Spentzouris, Panagiotis, Tiradani, Anthony, Timm, Steve, and Vaandering, Eric W. Fri . "HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation". United States. doi:10.1007/s41781-017-0001-9. https://www.osti.gov/servlets/purl/1418149.
@article{osti_1418149,
title = {HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation},
author = {Holzman, Burt and Bauerdick, Lothar A. T. and Bockelman, Brian and Dykstra, Dave and Fisk, Ian and Fuess, Stuart and Garzoglio, Gabriele and Girone, Maria and Gutsche, Oliver and Hufnagel, Dirk and Kim, Hyunwoo and Kennedy, Robert and Magini, Nicolo and Mason, David and Spentzouris, Panagiotis and Tiradani, Anthony and Timm, Steve and Vaandering, Eric W.},
abstractNote = {Historically, high energy physics computing has been performed on large purpose-built computing systems. These began as single-site compute facilities, but have evolved into the distributed computing grids used today. Recently, there has been an exponential increase in the capacity and capability of commercial clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is a growing interest among the cloud providers to demonstrate the capability to perform large-scale scientific computing. In this paper, we discuss results from the CMS experiment using the Fermilab HEPCloud facility, which utilized both local Fermilab resources and virtual machines in the Amazon Web Services Elastic Compute Cloud. We discuss the planning, technical challenges, and lessons learned involved in performing physics workflows on a large-scale set of virtualized resources. Additionally, we will discuss the economics and operational efficiencies when executing workflows both in the cloud and on dedicated resources.},
doi = {10.1007/s41781-017-0001-9},
journal = {Computing and Software for Big Science},
number = 1,
volume = 1,
place = {United States},
year = {2017},
month = {9}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:

Works referenced in this record:

The Pilot Way to Grid Resources Using glideinWMS
conference, March 2009

  • Sfiligoi, Igor; Bradley, Daniel C.; Holzman, Burt
  • 2009 WRI World Congress on Computer Science and Information Engineering
  • DOI: 10.1109/CSIE.2009.950

Scaling up ATLAS Event Service to production levels on opportunistic computing platforms
journal, October 2016


CMS conditions data access using FroNTier
journal, July 2008


The open science grid
journal, July 2007


The Diverse use of Clouds by CMS
journal, December 2015


Grid accounting service: state and future development
journal, June 2014


Belle II public and private cloud management in VMDIRAC system.
journal, December 2015


Using Amazon's Elastic Compute Cloud to dynamically scale CMS computational resources
journal, December 2011


Cloud Bursting with GlideinWMS: Means to satisfy ever increasing computing needs for Scientific Workflows
journal, June 2014


EOS as the present and future solution for data storage at CERN
journal, December 2015


LHC Machine
journal, August 2008


Early experience on using glideinWMS in the cloud
journal, December 2011


Cloud services for the Fermilab scientific stakeholders
journal, December 2015


Distributed computing in practice: the Condor experience
journal, January 2005

  • Thain, Douglas; Tannenbaum, Todd; Livny, Miron
  • Concurrency and Computation: Practice and Experience, Vol. 17, Issue 2-4, p. 323-356
  • DOI: 10.1002/cpe.938

The Evolution of Cloud Computing in ATLAS
journal, December 2015


Virtual machine provisioning, code management, and data movement design for the Fermilab HEPCloud Facility
journal, October 2017


Status and future perspectives of CernVM-FS
journal, December 2012


Stability and scalability of the CMS Global Pool: Pushing HTCondor and glideinWMS to new limits
journal, October 2017


The CMS workload management system
journal, December 2012