skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Designing a Multi-Petabyte Database for LSST

Abstract

The 3.2 giga-pixel LSST camera will produce approximately half a petabyte of archive images every month. These data need to be reduced in under a minute to produce real-time transient alerts, and then added to the cumulative catalog for further analysis. The catalog is expected to grow about three hundred terabytes per year. The data volume, the real-time transient alerting requirements of the LSST, and its spatio-temporal aspects require innovative techniques to build an efficient data access system at reasonable cost. As currently envisioned, the system will rely on a database for catalogs and metadata. Several database systems are being evaluated to understand how they perform at these data rates, data volumes, and access patterns. This paper describes the LSST requirements, the challenges they impose, the data access philosophy, results to date from evaluating available database technologies against LSST requirements, and the proposed database architecture to meet the data challenges.

Authors:
; ; ; ; ; ; ; ;
Publication Date:
Research Org.:
Stanford Linear Accelerator Center (SLAC)
Sponsoring Org.:
USDOE
OSTI Identifier:
897453
Report Number(s):
SLAC-PUB-12292
TRN: US200705%%250
DOE Contract Number:
AC02-76SF00515
Resource Type:
Conference
Resource Relation:
Conference: To appear in the proceedings of SPIE Conference on Observatory Operations: Strategies, Processes, and Systems, Orlando, Florida, 26-27 May 2006
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; 71 CLASSICAL AND QUANTUM MECHANICS, GENERAL PHYSICS; CAMERAS; TELESCOPES; ASTROPHYSICS; INFORMATION SYSTEMS; DATA BASE MANAGEMENT; DESIGN; Accelerators,ASTRO

Citation Formats

Becla, Jacek, Hanushevsky, Andrew, Nikolaev, Sergei, Abdulla, Ghaleb, Szalay, Alex, Nieto-Santisteban, Maria, Thakar, Ani, Gray, Jim, and /SLAC. Designing a Multi-Petabyte Database for LSST. United States: N. p., 2007. Web.
Becla, Jacek, Hanushevsky, Andrew, Nikolaev, Sergei, Abdulla, Ghaleb, Szalay, Alex, Nieto-Santisteban, Maria, Thakar, Ani, Gray, Jim, & /SLAC. Designing a Multi-Petabyte Database for LSST. United States.
Becla, Jacek, Hanushevsky, Andrew, Nikolaev, Sergei, Abdulla, Ghaleb, Szalay, Alex, Nieto-Santisteban, Maria, Thakar, Ani, Gray, Jim, and /SLAC. Wed . "Designing a Multi-Petabyte Database for LSST". United States. doi:. https://www.osti.gov/servlets/purl/897453.
@article{osti_897453,
title = {Designing a Multi-Petabyte Database for LSST},
author = {Becla, Jacek and Hanushevsky, Andrew and Nikolaev, Sergei and Abdulla, Ghaleb and Szalay, Alex and Nieto-Santisteban, Maria and Thakar, Ani and Gray, Jim and /SLAC},
abstractNote = {The 3.2 giga-pixel LSST camera will produce approximately half a petabyte of archive images every month. These data need to be reduced in under a minute to produce real-time transient alerts, and then added to the cumulative catalog for further analysis. The catalog is expected to grow about three hundred terabytes per year. The data volume, the real-time transient alerting requirements of the LSST, and its spatio-temporal aspects require innovative techniques to build an efficient data access system at reasonable cost. As currently envisioned, the system will rely on a database for catalogs and metadata. Several database systems are being evaluated to understand how they perform at these data rates, data volumes, and access patterns. This paper describes the LSST requirements, the challenges they impose, the data access philosophy, results to date from evaluating available database technologies against LSST requirements, and the proposed database architecture to meet the data challenges.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Wed Jan 10 00:00:00 EST 2007},
month = {Wed Jan 10 00:00:00 EST 2007}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • The 3.2 giga-pixel LSST camera will produce over half a petabyte of raw images every month. This data needs to be reduced in under a minute to produce real-time transient alerts, and then cataloged and indexed to allow efficient access and simplify further analysis. The indexed catalogs alone are expected to grow at a speed of about 600 terabytes per year. The sheer volume of data, the real-time transient alerting requirements of the LSST, and its spatio-temporal aspects require cutting-edge techniques to build an efficient data access system at reasonable cost. As currently envisioned, the system will rely on amore » database for catalogs and metadata. Several database systems are being evaluated to understand how they will scale and perform at these data volumes in anticipated LSST access patterns. This paper describes the LSST requirements, the challenges they impose, the data access philosophy, and the database architecture that is expected to be adopted in order to meet the data challenges.« less
  • No abstract prepared.
  • Fermilab provides a multi-Petabyte scale mass storage system for High Energy Physics (HEP) Experiments and other scientific endeavors. We describe the scalability aspects of the hardware and software architecture that were designed into the Mass Storage System to permit us to scale to multiple petabytes of storage capacity, manage tens of terabytes per day in data transfers, support hundreds of users, and maintain data integrity. We discuss in detail how we scale the system over time to meet the ever-increasing needs of the scientific community, and relate our experiences with many of the technical and economic issues related to scalingmore » the system. Since the 2003 MSST conference, the experiments at Fermilab have generated more than 1.9 PB of additional data. We present results on how this system has scaled and performed for the Fermilab CDF and D0 Run II experiments as well as other HEP experiments and scientific endeavors.« less
  • Petascale systems are in existence today and will become common in the next few years. Such systems are inevitably very complex, highly distributed and heterogeneous. Monitoring a petascale system in real-time and understanding its status at any given moment without impacting its performance is a highly intricate task. Common approaches and off-the-shelf tools are either unusable, do not scale, or severely impact the performance of the monitored servers. This paper describes unobtrusive monitoring software developed at Stanford Linear Accelerator Center (SLAC) for a highly distributed petascale production data set. The paper describes the employed solutions, the lessons learned, the problemsmore » still to be addressed, and explains how the system can be reused elsewhere.« less
  • The Large Synoptic Survey Telescope (LSST) will catalog billions of astronomical objects and trillions of sources, all of which will be stored and managed by a database management system. One of the main challenges is real-time alert generation. To generate alerts, up to 100K new difference detections have to be cross-correlated with the huge historical catalogs, and then further processed to prune false alerts. This paper explains the challenges, the implementation of the LSST Association Pipeline and the database organization strategies we are planning to use to meet the real-time requirements, including data partitioning, parallelization, and pre-loading.