skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Exascale Data Analytics for the DOE

Authors:
 [1];  [2];  [3];  [4]; ORCiD logo [5]; ORCiD logo [5]; ORCiD logo [5]; ORCiD logo [5]; ORCiD logo [5]; ORCiD logo [5]
  1. Dartmouth College
  2. University of California, Los Angeles
  3. University of California Los Angeles
  4. The University of Tennessee, Knoxville
  5. ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1394389
DOE Contract Number:
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: ASCR PI Meeting - Rockville, Maryland, United States of America - 9/11/2017 4:00:00 AM-9/12/2017 4:00:00 AM
Country of Publication:
United States
Language:
English

Citation Formats

Gelb, Anne, Osher, Stan, Bertozzi, Andrea, Wise, Steve, Archibald, Richard K., Hauck, Cory D., Webster, Clayton G., Zhang, Guannan, Tran, Hoang A., and Law, Kody J. Exascale Data Analytics for the DOE. United States: N. p., 2017. Web.
Gelb, Anne, Osher, Stan, Bertozzi, Andrea, Wise, Steve, Archibald, Richard K., Hauck, Cory D., Webster, Clayton G., Zhang, Guannan, Tran, Hoang A., & Law, Kody J. Exascale Data Analytics for the DOE. United States.
Gelb, Anne, Osher, Stan, Bertozzi, Andrea, Wise, Steve, Archibald, Richard K., Hauck, Cory D., Webster, Clayton G., Zhang, Guannan, Tran, Hoang A., and Law, Kody J. 2017. "Exascale Data Analytics for the DOE". United States. doi:. https://www.osti.gov/servlets/purl/1394389.
@article{osti_1394389,
title = {Exascale Data Analytics for the DOE},
author = {Gelb, Anne and Osher, Stan and Bertozzi, Andrea and Wise, Steve and Archibald, Richard K. and Hauck, Cory D. and Webster, Clayton G. and Zhang, Guannan and Tran, Hoang A. and Law, Kody J.},
abstractNote = {},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = 2017,
month = 9
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • This project developed a generic and optimized set of core data analytics functions. These functions organically consolidate a broad constellation of high performance analytical pipelines. As the architectures of emerging HPC systems become inherently heterogeneous, there is a need to design algorithms for data analysis kernels accelerated on hybrid multi-node, multi-core HPC architectures comprised of a mix of CPUs, GPUs, and SSDs. Furthermore, the power-aware trend drives the advances in our performance-energy tradeoff analysis framework which enables our data analysis kernels algorithms and software to be parameterized so that users can choose the right power-performance optimizations.
  • The SciDAC-Data project is a DOE funded initiative to analyze and exploit two decades of information and analytics that have been collected by the Fermilab Data Center on the organization, movement, and consumption of High Energy Physics data. The project will analyze the analysis patterns and data organization that have been used by the NOvA, MicroBooNE, MINERvA and other experiments, to develop realistic models of HEP analysis workflows and data processing. The SciDAC-Data project aims to provide both realistic input vectors and corresponding output data that can be used to optimize and validate simulations of HEP analysis. These simulations aremore » designed to address questions of data handling, cache optimization and workflow structures that are the prerequisites for modern HEP analysis chains to be mapped and optimized to run on the next generation of leadership class exascale computing facilities. We will address the use of the SciDAC-Data distributions acquired from Fermilab Data Center’s analysis workflows and corresponding to around 71,000 HEP jobs, as the input to detailed queuing simulations that model the expected data consumption and caching behaviors of the work running in HPC environments. In particular we describe in detail how the Sequential Access via Metadata (SAM) data handling system in combination with the dCache/Enstore based data archive facilities have been analyzed to develop the radically different models of the analysis of HEP data. We present how the simulation may be used to analyze the impact of design choices in archive facilities.« less
  • Many challenges in systems biology have to do with analyzing data within the framework of molecular phenomena and cellular pathways. How does this relate to thermodynamics that we know govern the behavior of molecules? Making progress in relating data analysis to thermodynamics is essential in systems biology if we are to build predictive models that enable the field of synthetic biology. This report discusses work at the crossroads of thermodynamics and data analysis, and demonstrates that statistical mechanical free energy is a multinomial log likelihood. Applications to systems biology are presented.
  • Compendium of Papers from the 38th Workshop on Geothermal Reservoir Engineering Stanford University, Stanford, California February 11-13, 2013 The National Geothermal Data System (NGDS) is a distributed, interoperable network of data collected from state geological surveys across all fifty states and the nation’s leading academic geothermal centers. The system serves as a platform for sharing consistent, reliable, geothermal-relevant technical data with users of all types, while supplying tools relevant for their work. As aggregated data supports new scientific findings, this content-rich linked data ultimately broadens the pool of knowledge available to promote discovery and development of commercial-scale geothermal energy production.more » Most of the up-front risks associated with geothermal development stem from exploration and characterization of subsurface resources. Wider access to distributed data will, therefore, result in lower costs for geothermal development. NGDS is on track to become fully operational by 2014 and will provide a platform for custom applications for accessing geothermal relevant data in the U.S. and abroad. It is being built on the U.S. Geoscience Information Network (USGIN) data integration framework to promote interoperability across the Earth sciences community. The basic structure of the NGDS employs state-of-the art informatics to advance geothermal knowledge. The following four papers comprising this Open-File Report are a compendium of presentations, from the 38th Annual Workshop on Geothermal Reservoir Engineering, taking place February 11-13, 2013 at Stanford University, Stanford, California. “NGDS Geothermal Data Domain: Assessment of Geothermal Community Data Needs,” outlines the efforts of a set of nationwide data providers to supply data for the NGDS. In particular, data acquisition, delivery, and methodology are discussed. The paper addresses the various types of data and metadata required and why simple links to existing data are insufficient for promoting geothermal exploration. Authors of this paper are Arlene Anderson, US DOE Geothermal Technologies Office, David Blackwell, Southern Methodist University (SMU), Cathy Chickering (SMU), Toni Boyd, Oregon Institute of Technology’s GeoHeat Center, Roland Horne, Stanford University, Matthew MacKenzie, Uberity, Joe Moore, University of Utah, Duane Nickull, Uberity, Stephen Richard, Arizona Geological Survey, and Lisa Shevenell, University of Nevada, Reno. “NGDS User Centered Design: Meeting the Needs of the Geothermal Community,” discusses the user- centered design approach taken in the development of a user interface solution for the NGDS. The development process is research based, highly collaborative, and incorporates state-of-the-art practices to ensure a quality user interface for the widest and greatest utility. Authors of this paper are Harold Blackman, Boise State University, Suzanne Boyd, Anthro-Tech, Kim Patten, Arizona Geological Survey, and Sam Zheng, Siemens Corporate Research. “Fueling Innovation and Adoption by Sharing Data on the DOE Geothermal Data Repository Node on the National Geothermal Data System,” describes the motivation behind the development of the Geothermal Data Repository (GDR) and its role in the NGDS. This includes the benefits of using the GDR to share geothermal data of all types and DOE’s data submission process. Authors of this paper are Jon Weers, National Renewable Energy Laboratory and Arlene Anderson, US DOE Geothermal Technologies Office. Finally, “Developing the NGDS Adoption of CKAN for Domestic & International Data Deployment,” provides an overview of the “Node-In-A-Box” software package designed to provide data consumers with a highly functional interface to access the system, and to ease the burden on data providers who wish to publish data in the system. It is important to note that this software package constitutes a reference implementation and that the NGDS architecture is based on open standards, which means other server software can make resources available, and other client applications can utilize NGDS data. Authors of this paper are Ryan Clark, Arizona Geological Survey (AZGS), Christoph Kuhmuench, Siemens Corporate Research, and Stephen Richard, AZGS.« less