skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Data Management, the Victorian era child of the 21st century

Abstract

Do you remember when a gigabyte disk drive was “a lot” of storage in that by-gone age of the 20th century? Still in our first decade of the 21st century, major supercomputer sites now speak of storage in terms of petabytes, 1015 bytes, or six orders of magnitude increase in capacity over a gigabyte! Unlike our archaic “big” disk drive where all the data was in one place, HPC storage is now distributed across many machines and even across the Internet. Collaborative research engages many scientists who need to find and use each others data, preferably in an automated fashion, which complicates an already muddled problem.

Authors:
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
915689
Report Number(s):
PNNL-SA-53343
TRN: US200816%%26
DOE Contract Number:
AC05-76RL01830
Resource Type:
Journal Article
Resource Relation:
Journal Name: Scientific Computing, 24(4):12; Journal Volume: 24; Journal Issue: 4
Country of Publication:
United States
Language:
English
Subject:
97; 99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; MEMORY DEVICES; CAPACITY; INTERNET; SUPERCOMPUTERS; DATA PROCESSING

Citation Formats

Farber, Rob. Data Management, the Victorian era child of the 21st century. United States: N. p., 2007. Web.
Farber, Rob. Data Management, the Victorian era child of the 21st century. United States.
Farber, Rob. Fri . "Data Management, the Victorian era child of the 21st century". United States. doi:.
@article{osti_915689,
title = {Data Management, the Victorian era child of the 21st century},
author = {Farber, Rob},
abstractNote = {Do you remember when a gigabyte disk drive was “a lot” of storage in that by-gone age of the 20th century? Still in our first decade of the 21st century, major supercomputer sites now speak of storage in terms of petabytes, 1015 bytes, or six orders of magnitude increase in capacity over a gigabyte! Unlike our archaic “big” disk drive where all the data was in one place, HPC storage is now distributed across many machines and even across the Internet. Collaborative research engages many scientists who need to find and use each others data, preferably in an automated fashion, which complicates an already muddled problem.},
doi = {},
journal = {Scientific Computing, 24(4):12},
number = 4,
volume = 24,
place = {United States},
year = {Fri Mar 30 00:00:00 EDT 2007},
month = {Fri Mar 30 00:00:00 EDT 2007}
}
  • This paper addresses some of the changes in the approach to formatting, acquisition, storage, retrieval, and dissemination of electronic information generated during the design, construction, and operation of a nuclear facility.
  • This paper discusses key principles for the development of materials property information management software systems. There are growing needs for automated materials information management in various organizations. In part these are fuelled by the demands for higher efficiency in material testing, product design and engineering analysis. But equally important, organizations are being driven by the needs for consistency, quality and traceability of data, as well as control of access to proprietary or sensitive information. Further, the use of increasingly sophisticated nonlinear, anisotropic and multi-scale engineering analyses requires both processing of large volumes of test data for development of constitutive modelsmore » and complex materials data input for Computer-Aided Engineering (CAE) software. And finally, the globalization of economy often generates great needs for sharing a single gold source of materials information between members of global engineering teams in extended supply-chains. Fortunately material property management systems have kept pace with the growing user demands and evolved to versatile data management systems that can be customized to specific user needs. The more sophisticated of these provide facilities for: (i) data management functions such as access, version, and quality controls; (ii) a wide range of data import, export and analysis capabilities; (iii) data pedigree traceability mechanisms; (iv) data searching, reporting and viewing tools; and (v) access to the information via a wide range of interfaces. In this paper the important requirements for advanced material data management systems, future challenges and opportunities such as automated error checking, data quality characterization, identification of gaps in datasets, as well as functionalities and business models to fuel database growth and maintenance are discussed.« less
  • Editorial for IEEE Computer Special edition on Data Intensive Computing
  • Data from the American Association of Poison Control Centers (AAPCC) and the Cincinnati-based Drug and Poison Information Center (DPIC) were analyzed to determine the incidence and trends of human plant poisonings since the year 2000. Approximately 3.4% of the approximately 4.3 million annual calls to the AAPCC centers involved plants, with a higher fraction (4.5%) for pediatric exposures. Nearly 70% of plant exposures occurred in children under six. Only 8% of cases required treatment in a health-care facility, and only 0.1% (in 2008) were considered severe outcomes. The most prominent groups of plants involved in exposures are those containing oxalates,more » and the most common symptom is gastroenteritis. The top 12 identified plants (in descending order) nationally were Spathiphyllum species (peace lilly), Philodendron species (philodendron), Euphorbia pulcherrima (poinssettia), Ilex species (holly), Phytolacca americana (pokeweed), Toxicodendron radicans (poison ivy), Capsicum (pepper), Ficus (rubber tree, weeping fig), Crassula argentea (jade plant), Diffenbachia (dumb cane), Epipremnum areum (pothos) and Schlumbergera bridesii (Christmas cactus). Broad overlaps between the DPIC and the AAPCC incidence data were noted, with essentially the same plant species in each dataset. The nature of the various toxins, the symptomatology and potential treatments are discussed for the highest ranking plant species.« less
  • Response to environmental chemicals can vary widely among individuals and between population groups. In human health risk assessment, data on susceptibility can be utilized by deriving risk levels based on a study of a susceptible population and/or an uncertainty factor may be applied to account for the lack of information about susceptibility. Defining genetic susceptibility in response to environmental chemicals across human populations is an area of interest in the NAS' new paradigm of toxicity pathway-based risk assessment. Data from high-throughput/high content (HT/HC), including -omics (e.g., genomics, transcriptomics, proteomics, metabolomics) technologies, have been integral to the identification and characterization ofmore » drug target and disease loci, and have been successfully utilized to inform the mechanism of action for numerous environmental chemicals. Large-scale population genotyping studies may help to characterize levels of variability across human populations at identified target loci implicated in response to environmental chemicals. By combining mechanistic data for a given environmental chemical with next generation sequencing data that provides human population variation information, one can begin to characterize differential susceptibility due to genetic variability to environmental chemicals within and across genetically heterogeneous human populations. The integration of such data sources will be informative to human health risk assessment.« less