skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Supporting large-scale computational science

Abstract

A study has been carried out to determine the feasibility of using commercial database management systems (DBMSs) to support large-scale computational science. Conventional wisdom in the past has been that DBMSs are too slow for such data. Several events over the past few years have muddied the clarity of this mindset: 1. 2. 3. 4. Several commercial DBMS systems have demonstrated storage and ad-hoc quer access to Terabyte data sets. Several large-scale science teams, such as EOSDIS [NAS91], high energy physics [MM97] and human genome [Kin93] have adopted (or make frequent use of) commercial DBMS systems as the central part of their data management scheme. Several major DBMS vendors have introduced their first object-relational products (ORDBMSs), which have the potential to support large, array-oriented data. In some cases, performance is a moot issue. This is true in particular if the performance of legacy applications is not reduced while new, albeit slow, capabilities are added to the system. The basic assessment is still that DBMSs do not scale to large computational data. However, many of the reasons have changed, and there is an expiration date attached to that prognosis. This document expands on this conclusion, identifies the advantages and disadvantages ofmore » various commercial approaches, and describes the studies carried out in exploring this area. The document is meant to be brief, technical and informative, rather than a motivational pitch. The conclusions within are very likely to become outdated within the next 5-7 years, as market forces will have a significant impact on the state of the art in scientific data management over the next decade.« less

Authors:
Publication Date:
Research Org.:
Lawrence Livermore National Lab., CA (US)
Sponsoring Org.:
USDOE Office of Defense Programs (DP) (US)
OSTI Identifier:
8429
Report Number(s):
UCRL-ID-129903; DP0101035
DP0101035; TRN: AH200117%%148
DOE Contract Number:
W-7405-ENG-48
Resource Type:
Technical Report
Resource Relation:
Other Information: PBD: 1 Oct 1998
Country of Publication:
United States
Language:
English
Subject:
71 CLASSICAL AND QUANTUM MECHANICS, GENERAL PHYSICS; HIGH ENERGY PHYSICS; MANAGEMENT; MARKET; PERFORMANCE; STORAGE

Citation Formats

Musick, R. Supporting large-scale computational science. United States: N. p., 1998. Web. doi:10.2172/8429.
Musick, R. Supporting large-scale computational science. United States. doi:10.2172/8429.
Musick, R. 1998. "Supporting large-scale computational science". United States. doi:10.2172/8429. https://www.osti.gov/servlets/purl/8429.
@article{osti_8429,
title = {Supporting large-scale computational science},
author = {Musick, R},
abstractNote = {A study has been carried out to determine the feasibility of using commercial database management systems (DBMSs) to support large-scale computational science. Conventional wisdom in the past has been that DBMSs are too slow for such data. Several events over the past few years have muddied the clarity of this mindset: 1. 2. 3. 4. Several commercial DBMS systems have demonstrated storage and ad-hoc quer access to Terabyte data sets. Several large-scale science teams, such as EOSDIS [NAS91], high energy physics [MM97] and human genome [Kin93] have adopted (or make frequent use of) commercial DBMS systems as the central part of their data management scheme. Several major DBMS vendors have introduced their first object-relational products (ORDBMSs), which have the potential to support large, array-oriented data. In some cases, performance is a moot issue. This is true in particular if the performance of legacy applications is not reduced while new, albeit slow, capabilities are added to the system. The basic assessment is still that DBMSs do not scale to large computational data. However, many of the reasons have changed, and there is an expiration date attached to that prognosis. This document expands on this conclusion, identifies the advantages and disadvantages of various commercial approaches, and describes the studies carried out in exploring this area. The document is meant to be brief, technical and informative, rather than a motivational pitch. The conclusions within are very likely to become outdated within the next 5-7 years, as market forces will have a significant impact on the state of the art in scientific data management over the next decade.},
doi = {10.2172/8429},
journal = {},
number = ,
volume = ,
place = {United States},
year = 1998,
month =
}

Technical Report:

Save / Share:
  • Business needs have driven the development of commercial database systems since their inception. As a result, there has been a strong focus on supporting many users, minimizing the potential corruption or loss of data, and maximizing performance metrics like transactions per second, or TPC-C and TPC-D results. It turns out that these optimizations have little to do with the needs of the scientific community, and in particular have little impact on improving the management and use of large-scale high-dimensional data. At the same time, there is an unanswered need in the scientific community for many of the benefits offered bymore » a robust DBMS. For example, tying an ad-hoc query language such as SQL together with a visualization toolkit would be a powerful enhancement to current capabilities. Unfortunately, there has been little emphasis or discussion in the VLDB community on this mismatch over the last decade. The goal of the paper is to identify the specific issues that need to be resolved before large-scale scientific applications can make use of DBMS products. This topic is addressed in the context of an evaluation of commercial DBMS technology applied to the exploration of data generated by the Department of Energy`s Accelerated Strategic Computing Initiative (ASCI). The paper describes the data being generated for ASCI as well as current capabilities for interacting with and exploring this data. The attraction of applying standard DBMS technology to this domain is discussed, as well as the technical and business issues that currently make this an infeasible solution.« less
  • One of the continuing research efforts of the Blast Dynamics Branch at the U.S. Army Ballistic Research Laboratory (BRL) is to simulate the flow that results from nuclear explosions and to test the nuclear survivability of military equipment. When atmospheric nuclear blast tests were banned, chemical explosive tests were designed and conducted to simulate the blast and thermal pulses produced by real nuclear explosions. These full-scale tests provided data for analysis of nuclear survivability to tactical Army equipment. However, the logistics and expense of full-scale chemical explosive tests meant that an average of only one test could be conducted everymore » two years. A series of computational simulations are performed for comparison to experimental data from a 1/57 -scale Large Blast Simulator (LBS) experimental shock tube. The computations simulate experiments with various high pressure and temperature initial driver gas conditions. In addition to temperature and pressure variations, geometry and numerical accuracy variation are performed and studied. The computations were performed using an axisymmetric, inviscid, time-accurate, finite-volume numerical technique which employs upwind flux differencing with total variation diminishing techniques. Computational results are presented in the form of static and stagnation pressure versus time histories and contour plots.« less
  • The ALICE Memory Snooper is a software applications programming interface (API) and library for use in implementing computational steering systems. It allows distributed memory parallel programs to publish variables in the computation that may be accessed over the Internet. In this way, users can examine and even change the variables in their running application remotely. The API and library ensure the consistency of the variables across the distributed memory system.
  • Over the course of the past two decades, quantum mechanical calculations have emerged as a key component of modern materials research. However, the solution of the required quantum mechanical equations is a formidable task and this has severely limited the range of materials systems which can be investigated by such accurate, quantum mechanical means. The current state of the art for large-scale quantum simulations is the planewave (PW) method, as implemented in now ubiquitous VASP, ABINIT, and QBox codes, among many others. However, since the PW method uses a global Fourier basis, with strictly uniform resolution at all points inmore » space, and in which every basis function overlaps every other at every point, it suffers from substantial inefficiencies in calculations involving atoms with localized states, such as first-row and transition-metal atoms, and requires substantial nonlocal communications in parallel implementations, placing critical limits on scalability. In recent years, real-space methods such as finite-differences (FD) and finite-elements (FE) have been developed to address these deficiencies by reformulating the required quantum mechanical equations in a strictly local representation. However, while addressing both resolution and parallel-communications problems, such local real-space approaches have been plagued by one key disadvantage relative to planewaves: excessive degrees of freedom (grid points, basis functions) needed to achieve the required accuracies. And so, despite critical limitations, the PW method remains the standard today. In this work, we show for the first time that this key remaining disadvantage of real-space methods can in fact be overcome: by building known atomic physics into the solution process using modern partition-of-unity (PU) techniques in finite element analysis. Indeed, our results show order-of-magnitude reductions in basis size relative to state-of-the-art planewave based methods. The method developed here is completely general, applicable to any crystal symmetry and to both metals and insulators alike. We have developed and implemented a full self-consistent Kohn-Sham method, including both total energies and forces for molecular dynamics, and developed a full MPI parallel implementation for large-scale calculations. We have applied the method to the gamut of physical systems, from simple insulating systems with light atoms to complex d- and f-electron systems, requiring large numbers of atomic-orbital enrichments. In every case, the new PU FE method attained the required accuracies with substantially fewer degrees of freedom, typically by an order of magnitude or more, than the current state-of-the-art PW method. Finally, our initial MPI implementation has shown excellent parallel scaling of the most time-critical parts of the code up to 1728 processors, with clear indications of what will be required to achieve comparable scaling for the rest. Having shown that the key remaining disadvantage of real-space methods can in fact be overcome, the work has attracted significant attention: with sixteen invited talks, both domestic and international, so far; two papers published and another in preparation; and three new university and/or national laboratory collaborations, securing external funding to pursue a number of related research directions. Having demonstrated the proof of principle, work now centers on the necessary extensions and optimizations required to bring the prototype method and code delivered here to production applications.« less