skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Engineering the CernVM-Filesystem as a High Bandwidth Distributed Filesystem for Auxiliary Physics Data

Conference · · J.Phys.Conf.Ser.

A common use pattern in the computing models of particle physics experiments is running many distributed applications that read from a shared set of data files. We refer to this data is auxiliary data, to distinguish it from (a) event data from the detector (which tends to be different for every job), and (b) conditions data about the detector (which tends to be the same for each job in a batch of jobs). Relatively speaking, conditions data also tends to be relatively small per job where both event data and auxiliary data are larger per job. Unlike event data, auxiliary data comes from a limited working set of shared files. Since there is spatial locality of the auxiliary data access, the use case appears to be identical to that of the CernVM- Filesystem (CVMFS). However, we show that distributing auxiliary data through CVMFS causes the existing CVMFS infrastructure to perform poorly. We utilize a CVMFS client feature called 'alien cache' to cache data on existing local high-bandwidth data servers that were engineered for storing event data. This cache is shared between the worker nodes at a site and replaces caching CVMFS files on both the worker node local disks and on the site's local squids. We have tested this alien cache with the dCache NFSv4.1 interface, Lustre, and the Hadoop Distributed File System (HDFS) FUSE interface, and measured performance. In addition, we use high-bandwidth data servers at central sites to perform the CVMFS Stratum 1 function instead of the low-bandwidth web servers deployed for the CVMFS software distribution function. We have tested this using the dCache HTTP interface. As a result, we have a design for an end-to-end high-bandwidth distributed caching read-only filesystem, using existing client software already widely deployed to grid worker nodes and existing file servers already widely installed at grid sites. Files are published in a central place and are soon available on demand throughout the grid and cached locally on the site with a convenient POSIX interface. This paper discusses the details of the architecture and reports performance measurements.

Research Organization:
Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
Sponsoring Organization:
USDOE Office of Science (SC), High Energy Physics (HEP)
DOE Contract Number:
AC02-07CH11359
OSTI ID:
1247506
Report Number(s):
FERMILAB-CONF-15-211-CD; 1413844
Journal Information:
J.Phys.Conf.Ser., Vol. 664, Issue 4; Conference: 21st International Conference on Computing in High Energy and Nuclear Physics, Okinawa, Japan, 04/13-04/17/2015
Country of Publication:
United States
Language:
English

Similar Records

Web Proxy Auto Discovery for the WLCG
Journal Article · Thu Nov 23 00:00:00 EST 2017 · Journal of Physics. Conference Series · OSTI ID:1247506

Using Pilot Jobs and CernVM File System for Simplified Use of Containers and Software Distribution
Conference · Fri Jan 01 00:00:00 EST 2021 · TBD · OSTI ID:1247506

Virtual machine provisioning, code management, and data movement design for the Fermilab HEPCloud Facility
Journal Article · Sun Oct 01 00:00:00 EDT 2017 · Journal of Physics. Conference Series · OSTI ID:1247506

Related Subjects