skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Grid Collector: Using an event catalog to speed up user analysisin distributed environment

Conference ·
OSTI ID:882078

Nuclear and High Energy Physics experiments such as STAR at BNL are generating millions of files with Peta Bytes of data each year. In most cases, analysis programs have to read all events in a file in order to find the interesting ones. Since the interesting events may be a small fraction of events in the file, a significant portion of the computer time is wasted on reading the unwanted events. To address this issue, we developed a software system called Grid Collector. The core of Grid Collector is an Event Catalog. This catalog can be efficiently searched with compressed bitmap indices. Tests show that Grid Collector can index and search STAR event data much faster than database systems. It is fully integrated with an existing analysis framework so that a minimal effort is required to use Grid Collector. In addition, by taking advantage of existing file catalogs, Storage Resource Managers (SRMs) and GridFTP, Grid Collector automatically downloads the needed files anywhere on the Grid without user intervention. Grid Collector can significantly improve user productivity. For a user that typically performs computation on 50 percent of the events, using Grid Collector could reduce the turn around time by 30 percent. The improvement is more significant when searching for rare events, because only a small number of events with appropriate properties are read into memory and the necessary files are automatically located and down loaded through the best available route.

Research Organization:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Director. Office of Science. Office of AdvancedScientific Computing Research
DOE Contract Number:
DE-AC02-05CH11231
OSTI ID:
882078
Report Number(s):
LBNL-58092; R&D Project: KS3310; BnR: KJ0101030; TRN: US200613%%573
Resource Relation:
Conference: Computing in High Energy and Nuclear Physics(CHEP) 2004, Interlaken, Switzerland, 27th September - 1st October2004
Country of Publication:
United States
Language:
English