skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Planning of distributed data production for High Energy and Nuclear Physics

Abstract

Modern experiments in High Energy and Nuclear Physics heavily rely on distributed computations using multiple computational facilities across the world. One of the essential types of the computations is a distributed data production where petabytes of raw files from a single source has to be processed once (per production campaign) using thousands of CPUs at distant locations and the output has to be transferred back to that source. The data distribution over a large system does not necessary match the distribution of storage, network and CPU capacity. Therefore, bottlenecks may appear and lead to increased latency and degraded performance. In this paper we propose a new scheduling approach for distributed data production which is based on the network flow maximization model. In our approach a central planner defines how much input and output data should be transferred over each network link in order to maximize the computational throughput. Such plans are created periodically for a fixed planning time interval using up-to-date information on network, storage and CPU resources. The centrally created plans are executed in a distributed manner by dedicated services running at participating sites. In conclusion, our simulations based on the log records from the data production framework ofmore » the experiment STAR (Solenoid Tracker at RHIC) have shown that the proposed model systematically provides a better performance compared to the simulated traditional techniques.« less

Authors:
ORCiD logo [1];  [2];  [3]
  1. Czech Technical Univ. in Prague, Prague (Czech Republic); Nuclear Physics Institute of the Czech Academy of Sciences, Prague (Czech Republic)
  2. Brookhaven National Lab. (BNL), Upton, NY (United States)
  3. Masaryk Univ., Brno (Czech Republic)
Publication Date:
Research Org.:
Brookhaven National Lab. (BNL), Upton, NY (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Nuclear Physics (NP) (SC-26)
OSTI Identifier:
1480983
Report Number(s):
BNL-209348-2018-JAAM
Journal ID: ISSN 1386-7857
Grant/Contract Number:  
SC0012704
Resource Type:
Accepted Manuscript
Journal Name:
Cluster Computing
Additional Journal Information:
Journal Volume: 21; Journal Issue: 4; Journal ID: ISSN 1386-7857
Publisher:
Springer
Country of Publication:
United States
Language:
English
Subject:
73 NUCLEAR PHYSICS AND RADIATION PHYSICS; Load balancing; Job scheduling; Planning; Network flow; Distributed computing; Large scale computing; Grid; Data intensive applications; Data production; Big data

Citation Formats

Makatun, Dzmitry, Lauret, Jérôme, and Rudová, Hana. Planning of distributed data production for High Energy and Nuclear Physics. United States: N. p., 2018. Web. doi:10.1007/s10586-018-2834-3.
Makatun, Dzmitry, Lauret, Jérôme, & Rudová, Hana. Planning of distributed data production for High Energy and Nuclear Physics. United States. doi:10.1007/s10586-018-2834-3.
Makatun, Dzmitry, Lauret, Jérôme, and Rudová, Hana. Sat . "Planning of distributed data production for High Energy and Nuclear Physics". United States. doi:10.1007/s10586-018-2834-3. https://www.osti.gov/servlets/purl/1480983.
@article{osti_1480983,
title = {Planning of distributed data production for High Energy and Nuclear Physics},
author = {Makatun, Dzmitry and Lauret, Jérôme and Rudová, Hana},
abstractNote = {Modern experiments in High Energy and Nuclear Physics heavily rely on distributed computations using multiple computational facilities across the world. One of the essential types of the computations is a distributed data production where petabytes of raw files from a single source has to be processed once (per production campaign) using thousands of CPUs at distant locations and the output has to be transferred back to that source. The data distribution over a large system does not necessary match the distribution of storage, network and CPU capacity. Therefore, bottlenecks may appear and lead to increased latency and degraded performance. In this paper we propose a new scheduling approach for distributed data production which is based on the network flow maximization model. In our approach a central planner defines how much input and output data should be transferred over each network link in order to maximize the computational throughput. Such plans are created periodically for a fixed planning time interval using up-to-date information on network, storage and CPU resources. The centrally created plans are executed in a distributed manner by dedicated services running at participating sites. In conclusion, our simulations based on the log records from the data production framework of the experiment STAR (Solenoid Tracker at RHIC) have shown that the proposed model systematically provides a better performance compared to the simulated traditional techniques.},
doi = {10.1007/s10586-018-2834-3},
journal = {Cluster Computing},
number = 4,
volume = 21,
place = {United States},
year = {2018},
month = {8}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:

Works referenced in this record:

MapReduce: simplified data processing on large clusters
journal, January 2008

  • Dean, Jeffrey; Ghemawat, Sanjay; Mehta, Brijesh
  • Communications of the ACM, Vol. 51, Issue 1
  • DOI: 10.1145/1327452.1327492

Survey on Grid Resource Allocation Mechanisms
journal, April 2014

  • Qureshi, Muhammad Bilal; Dehnavi, Maryam Mehri; Min-Allah, Nasro
  • Journal of Grid Computing, Vol. 12, Issue 2
  • DOI: 10.1007/s10723-014-9292-9

Rucio – The next generation of large scale distributed system for ATLAS Data Management
journal, June 2014


The Hadoop Distributed File System
conference, May 2010

  • Shvachko, Konstantin; Kuang, Hairong; Radia, Sanjay
  • 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
  • DOI: 10.1109/MSST.2010.5496972

A Survey of Information-Centric Networking Research
journal, July 2014

  • Xylomenos, George; Ververidis, Christopher N.; Siris, Vasilios A.
  • IEEE Communications Surveys & Tutorials, Vol. 16, Issue 2
  • DOI: 10.1109/SURV.2013.070813.00063

A survey of information-centric networking
journal, July 2012

  • Ahlgren, Bengt; Dannewitz, Christian; Imbrenda, Claudio
  • IEEE Communications Magazine, Vol. 50, Issue 7
  • DOI: 10.1109/MCOM.2012.6231276

MonALISA: An agent based, dynamic service system to monitor, control and optimize distributed systems
journal, December 2009


The Globus Striped GridFTP Framework and Server
conference, January 2005

  • Allcock, W.; Bresnahan, J.; Kettimuthu, R.
  • ACM/IEEE SC 2005 Conference (SC'05)
  • DOI: 10.1109/SC.2005.72

A taxonomy of Data Grids for distributed data sharing, management, and processing
journal, June 2006

  • Venugopal, Srikumar; Buyya, Rajkumar; Ramamohanarao, Kotagiri
  • ACM Computing Surveys, Vol. 38, Issue 1
  • DOI: 10.1145/1132952.1132955

A quality of service architecture that combines resource reservation and application adaptation
conference, January 2000

  • Foster, I.; Roy, A.; Sander, V.
  • IEEE Communications Society Workshop on Quality of Service, 2000 Eighth International Workshop on Quality of Service. IWQoS 2000 (Cat. No.00EX400)
  • DOI: 10.1109/IWQOS.2000.847954

STAR detector overview
journal, March 2003

  • Ackermann, K. H.; Adams, N.; Adler, C.
  • Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, Vol. 499, Issue 2-3
  • DOI: 10.1016/S0168-9002(02)01960-5

Software-Defined Networking: A Comprehensive Survey
journal, January 2015

  • Kreutz, Diego; Ramos, Fernando M. V.; Esteves Verissimo, Paulo
  • Proceedings of the IEEE, Vol. 103, Issue 1
  • DOI: 10.1109/JPROC.2014.2371999

On power-law relationships of the Internet topology
journal, October 1999

  • Faloutsos, Michalis; Faloutsos, Petros; Faloutsos, Christos
  • ACM SIGCOMM Computer Communication Review, Vol. 29, Issue 4
  • DOI: 10.1145/316194.316229

Pegasus, a workflow management system for science automation
journal, May 2015


Data replication strategies with performance objective in data grid systems: a survey
journal, January 2015

  • Mokadem, Riad; Hameurlain, Abdelkader
  • International Journal of Grid and Utility Computing, Vol. 6, Issue 1
  • DOI: 10.1504/IJGUC.2015.066395

Quincy: fair scheduling for distributed computing clusters
conference, January 2009

  • Isard, Michael; Prabhakaran, Vijayan; Currey, Jon
  • Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles - SOSP '09
  • DOI: 10.1145/1629575.1629601

Simulations and study of a new scheduling approach for distributed data production
journal, October 2016


Flow-based load balancing in multipathed layer-2 networks using OpenFlow and multipath-TCP
conference, January 2014

  • Bredel, Michael; Bozakov, Zdravko; Barczyk, Artur
  • Proceedings of the third workshop on Hot topics in software defined networking - HotSDN '14
  • DOI: 10.1145/2620728.2620770

AliEn: ALICE environment on the GRID
journal, July 2008


Adaptation and Policy-Based Resource Allocation for Efficient Bulk Data Transfers in High Performance Computing Environments
conference, November 2014

  • Chervenak, Ann L.; Sim, Alex; Gu, Junmin
  • 2014 Fourth International Workshop on Network-Aware Data Management (NDM)
  • DOI: 10.1109/NDM.2014.7

Dynamic replication strategies in data grid systems: a survey
journal, August 2015

  • Tos, Uras; Mokadem, Riad; Hameurlain, Abdelkader
  • The Journal of Supercomputing, Vol. 71, Issue 11
  • DOI: 10.1007/s11227-015-1508-7

Heuristics for scheduling parameter sweep applications in grid environments
conference, January 2000

  • Casanova, H.; Legrand, A.; Zagorodnov, D.
  • Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556)
  • DOI: 10.1109/HCW.2000.843757

Decoupling computation and data scheduling in distributed data-intensive applications
conference, January 2002

  • Ranganathan, K.; Foster, I.
  • Proceedings 11th IEEE International Symposium on High Performance Distributed Computing
  • DOI: 10.1109/HPDC.2002.1029935

The ATLAS Distributed Data Management project: Past and Future
journal, December 2012


One click dataset transfer: toward efficient coupling of distributed storage resources and CPUs
journal, June 2012


The future of PanDA in ATLAS distributed computing
journal, December 2015


Efficient Data Staging Using Performance-Based Adaptation and Policy-Based Resource Allocation
conference, February 2014

  • Chervenak, Ann L.; Sim, Alex; Gu, Junmin
  • 2014 22nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing
  • DOI: 10.1109/PDP.2014.49

A Taxonomy of Job Scheduling on Distributed Computing Systems
journal, December 2016

  • Lopes, Raquel V.; Menasce, Daniel
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 27, Issue 12
  • DOI: 10.1109/TPDS.2016.2537821

DIRAC pilot framework and the DIRAC Workload Management System
journal, April 2010


The only constant is change: incorporating time-varying network reservations in data centers
journal, September 2012

  • Xie, Di; Ding, Ning; Hu, Y. Charlie
  • ACM SIGCOMM Computer Communication Review, Vol. 42, Issue 4
  • DOI: 10.1145/2377677.2377718

DIRAC optimized workload management
journal, July 2008


A Survey of Software-Defined Networking: Past, Present, and Future of Programmable Networks
journal, October 2014

  • Nunes, Bruno Astuto A.; Mendonca, Marc; Nguyen, Xuan-Nam
  • IEEE Communications Surveys & Tutorials, Vol. 16, Issue 3
  • DOI: 10.1109/SURV.2014.012214.00180

A survey of dynamic replication and replica selection strategies based on data mining techniques in data grids
journal, February 2016


A Taxonomy of Workflow Management Systems for Grid Computing
journal, September 2005


GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing
journal, November 2002

  • Buyya, Rajkumar; Murshed, Manzur
  • Concurrency and Computation: Practice and Experience, Vol. 14, Issue 13-15
  • DOI: 10.1002/cpe.710

Scientific workflow management and the Kepler system
journal, January 2006

  • Ludäscher, Bertram; Altintas, Ilkay; Berkley, Chad
  • Concurrency and Computation: Practice and Experience, Vol. 18, Issue 10
  • DOI: 10.1002/cpe.994

Taxonomies of workflow scheduling problem and techniques in the cloud
journal, November 2015


Dynamic replica placement and selection strategies in data grids— A comprehensive survey
journal, February 2014

  • Kingsy Grace, R.; Manimegalai, R.
  • Journal of Parallel and Distributed Computing, Vol. 74, Issue 2
  • DOI: 10.1016/j.jpdc.2013.10.009

Borg, Omega, and Kubernetes
journal, April 2016

  • Burns, Brendan; Grant, Brian; Oppenheimer, David
  • Communications of the ACM, Vol. 59, Issue 5
  • DOI: 10.1145/2890784

Distributed computing in practice: the Condor experience
journal, January 2005

  • Thain, Douglas; Tannenbaum, Todd; Livny, Miron
  • Concurrency and Computation: Practice and Experience, Vol. 17, Issue 2-4, p. 323-356
  • DOI: 10.1002/cpe.938

Commissioning the HTCondor-CE for the Open Science Grid
journal, December 2015


Offloading peak processing to virtual farm by STAR experiment at RHIC
journal, June 2012


A taxonomy and survey on scheduling algorithms for scientific workflows in IaaS cloud computing environments: Workflow Scheduling Algorithms for Clouds
journal, December 2016

  • Rodriguez, Maria Alejandra; Buyya, Rajkumar
  • Concurrency and Computation: Practice and Experience, Vol. 29, Issue 8
  • DOI: 10.1002/cpe.4041

DENS: data center energy-efficient network-aware scheduling
journal, September 2011


The Google file system
journal, December 2003

  • Ghemawat, Sanjay; Gobioff, Howard; Leung, Shun-Tak
  • ACM SIGOPS Operating Systems Review, Vol. 37, Issue 5
  • DOI: 10.1145/1165389.945450

Dryad: distributed data-parallel programs from sequential building blocks
journal, June 2007

  • Isard, Michael; Budiu, Mihai; Yu, Yuan
  • ACM SIGOPS Operating Systems Review, Vol. 41, Issue 3
  • DOI: 10.1145/1272998.1273005

Bandwidth-centric allocation of independent tasks on heterogeneous platforms
conference, January 2002

  • Beaumont, O.; Carter, L.; Ferrante, J.
  • Proceedings 16th International Parallel and Distributed Processing Symposium. IPDPS 2002
  • DOI: 10.1109/IPDPS.2002.1015568

Network flows for data distribution and computation
conference, December 2016

  • Makatun, Dzmitry; Lauret, Jerome; Rudova, Hana
  • 2016 IEEE Symposium Series on Computational Intelligence (SSCI)
  • DOI: 10.1109/SSCI.2016.7850083