POPM: A Distributed query system for high performance analysis of very large persistent object stores
Conference
·
OSTI ID:207477
- Fermilab
Analysis of large physics data sets is a major computing task at Fermilab. One step in such an analysis involves culling ``interesting`` events via the use of complex query criteria. What makes this unusual is the scale required: 100`s of gigabytes of event data must be scanned at 10`s of megabytes per second for the typical queries that are applied, and data must be extracted from 10`s of terabytes based on the result of the query. The Physics Object Persistency Manager (POPM) system is a solution tailored to this scale of problem. A running POPM environment can support multiple queries in progress, each scanning at rates exceeding 10 megabytes per second, all of which are sharing access to a very large persistent address space distributed across multiple disks on multiple hosts. Specifically, POPM employs the following techniques to permit this scale of performance and access: Persistent objects: Experimental data to be scanned is ``populated`` as a data structure into the persistent address space supported by POPM. C++ classes with a few key overloaded operators provide nearly transparent semantics for access to the persistent storage. Distributed and parallel I/O: The persistent address space is automatically distributed across disks of multiple ``I/O nodes`` within the POPM system. A striping unit concept is implemented in POPM, permitting fast parallel I/O across the storage nodes, even for small single queries. Efficient Shared access: POPM implements an efficient mechanism for arbitration and multiplexing of I/O access among multiple queries on the same or separate compute nodes.
- Research Organization:
- Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), High Energy Physics (HEP) (SC-25)
- DOE Contract Number:
- AC02-07CH11359
- OSTI ID:
- 207477
- Report Number(s):
- FERMILAB-CONF-96-002; oai:inspirehep.net:415572
- Country of Publication:
- United States
- Language:
- English
Similar Records
Flexible storage services for parallel data mining
Query estimation and order-optimized iteration in very large federations
ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems
Conference
·
Mon Dec 30 23:00:00 EST 1996
·
OSTI ID:465727
Query estimation and order-optimized iteration in very large federations
Conference
·
Mon May 04 00:00:00 EDT 1998
·
OSTI ID:656716
ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems
Journal Article
·
Thu Jan 16 19:00:00 EST 2020
· Journal of Computer Science and Technology
·
OSTI ID:1582374