skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Grid Collector: Using an event catalog to speed up user analysisin distributed environment

Conference ·
OSTI ID:882159

Nuclear and High Energy Physics experiments such as STAR atBNL are generating millions of files with PetaBytes of data each year. Inmost cases, analysis programs have to read all events in a file in orderto find the interesting ones. Since the interesting events may be a smallfraction of events in the file, a significant portion of the computertime is wasted on reading the unwanted events. To address this issue, wedeveloped a software system called Grid Collector. The core of GridCollector is an Event Catalog. This catalog can be efficiently searchedwith compressed bitmap indices. Tests show that Grid Collector can indexand search STAR event data much faster than database systems. It is fullyintegrated with an existing analysis framework so that aminimal effort isrequired to use Grid Collector. In addition, by taking advantage ofexisting file catalogs, Storage Resource Managers (SRMs) and GridFTP,Grid Collector automatically downloads the needed files anywhere on theGrid without user intervention. Grid Collector can significantly improveuser productivity. For a user that typically performs computation on 50percent of the events, using Grid Collector could reduce the turn aroundtime by 30 percent. The improvement is more significant when searchingfor rare events, because only a small number of events with appropriateproperties are read into memory and the necessary files are automaticallylocated and down loaded through the best available route.

Research Organization:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Director. Office of Science. Office of AdvancedScientific Computing Research
DOE Contract Number:
DE-AC02-05CH11231
OSTI ID:
882159
Report Number(s):
LBNL-58092; R&D Project: KS3310; BnR: KJ0101030
Resource Relation:
Conference: Computing in High Energy and Nuclear Physics(CHEP) 2004, Interlaken, Switzerland, 27th September - 1st October2004
Country of Publication:
United States
Language:
English