skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Monitoring data transfer latency in CMS computing operations

Journal Article · · Journal of Physics. Conference Series
 [1];  [1];  [2];  [3];  [4];  [5]
  1. Univ. of Bologna (Italy)
  2. Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
  3. Ecole Polytechnique of Paris, Palaiseau (France)
  4. Cukurova Univ. (Turkey)
  5. Princeton Univ., Princeton, NJ (United States)

During the first LHC run, the CMS experiment collected tens of Petabytes of collision and simulated data, which need to be distributed among dozens of computing centres with low latency in order to make efficient use of the resources. While the desired level of throughput has been successfully achieved, it is still common to observe transfer workflows that cannot reach full completion in a timely manner due to a small fraction of stuck files which require operator intervention.For this reason, in 2012 the CMS transfer management system, PhEDEx, was instrumented with a monitoring system to measure file transfer latencies, and to predict the completion time for the transfer of a data set. The operators can detect abnormal patterns in transfer latencies while the transfer is still in progress, and monitor the long-term performance of the transfer infrastructure to plan the data placement strategy.Based on the data collected for one year with the latency monitoring system, we present a study on the different factors that contribute to transfer completion time. As case studies, we analyze several typical CMS transfer workflows, such as distribution of collision event data from CERN or upload of simulated event data from the Tier-2 centres to the archival Tier-1 centres. For each workflow, we present the typical patterns of transfer latencies that have been identified with the latency monitor.We identify the areas in PhEDEx where a development effort can reduce the latency, and we show how we are able to detect stuck transfers which need operator intervention. Lastly, we propose a set of metrics to alert about stuck subscriptions and prompt for manual intervention, with the aim of improving transfer completion times.

Research Organization:
Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
Sponsoring Organization:
USDOE Office of Science (SC), High Energy Physics (HEP)
Grant/Contract Number:
AC02-07CH11359
OSTI ID:
1346387
Report Number(s):
FERMILAB-CONF-15-659-CMS; 1413830
Journal Information:
Journal of Physics. Conference Series, Vol. 664, Issue 3; ISSN 1742-6588
Publisher:
IOP PublishingCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 2 works
Citation information provided by
Web of Science

References (4)

The CMS Data Management System journal June 2014
The Worldwide LHC Computing Grid (worldwide LCG) journal July 2007
Scaling CMS data transfer system for LHC start-up journal July 2008
No file left behind - monitoring transfer latencies in PhEDEx journal December 2012

Cited By (1)

Machine learning at the energy and intensity frontiers of particle physics journal August 2018

Similar Records

No file left behind - monitoring transfer latencies in PhEDEx
Conference · Sun Jan 01 00:00:00 EST 2012 · J.Phys.Conf.Ser. · OSTI ID:1346387

Large scale and low latency analysis facilities for the CMS experiment: Development and operational aspects
Conference · Sat Jan 01 00:00:00 EST 2011 · J.Phys.Conf.Ser. · OSTI ID:1346387

Pooling the resources of the CMS Tier-1 sites
Journal Article · Wed Dec 23 00:00:00 EST 2015 · Journal of Physics. Conference Series · OSTI ID:1346387

Related Subjects