skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Next-gen tools for big scientific data: ARM data center example

Authors:
 [1];  [1];  [1];  [1];  [1];  [1];  [1]
  1. ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1343550
DOE Contract Number:
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: 2016 IEEE International Conference on Big Data, Washington D.C., DC, USA, 20161205, 20161205
Country of Publication:
United States
Language:
English

Citation Formats

Devarakonda, Ranjeet, Dumas, Kyle K, Beus, Sherman J, Rush III, Everett N, Krishna, Bhargavi, Records, Robert J, and Prakash, Giri. Next-gen tools for big scientific data: ARM data center example. United States: N. p., 2017. Web. doi:10.1109/BigData.2016.7841078.
Devarakonda, Ranjeet, Dumas, Kyle K, Beus, Sherman J, Rush III, Everett N, Krishna, Bhargavi, Records, Robert J, & Prakash, Giri. Next-gen tools for big scientific data: ARM data center example. United States. doi:10.1109/BigData.2016.7841078.
Devarakonda, Ranjeet, Dumas, Kyle K, Beus, Sherman J, Rush III, Everett N, Krishna, Bhargavi, Records, Robert J, and Prakash, Giri. Sun . "Next-gen tools for big scientific data: ARM data center example". United States. doi:10.1109/BigData.2016.7841078.
@article{osti_1343550,
title = {Next-gen tools for big scientific data: ARM data center example},
author = {Devarakonda, Ranjeet and Dumas, Kyle K and Beus, Sherman J and Rush III, Everett N and Krishna, Bhargavi and Records, Robert J and Prakash, Giri},
abstractNote = {},
doi = {10.1109/BigData.2016.7841078},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Sun Jan 01 00:00:00 EST 2017},
month = {Sun Jan 01 00:00:00 EST 2017}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • A principal tenant of the scientific method is that experiments must be repeatable and relies on ceteris paribus (i.e., all other things being equal). As a scientific community, involved in data sciences, we must investigate ways to establish an environment where experiments can be repeated. We can no longer allude to where the data comes from, we must add rigor to the data collection and management process from which our analysis is conducted. This paper describes a computing environment to support repeatable scientific big data experimentation of world-wide scientific literature, and recommends a system that is housed at the Oakmore » Ridge National Laboratory in order to provide value to investigators from government agencies, academic institutions, and industry entities. The described computing environment also adheres to the recently instituted digital data management plan mandated by multiple US government agencies, which involves all stages of the digital data life cycle including capture, analysis, sharing, and preservation. It particularly focuses on the sharing and preservation of digital research data. The details of this computing environment are explained within the context of cloud services by the three layer classification of Software as a Service , Platform as a Service , and Infrastructure as a Service .« less
  • This paper presents our design for an asynchronous object storage system intended for use in scientific and commercial big data workloads. Use cases from the target workload do- mains are used to motivate the key abstractions used in the application programming interface (API). The architecture of the Scalable Object Store (SOS), a prototype object stor- age system that supports the API s facilities, is presented. The SOS serves as a vehicle for future research into scalable and resilient big data object storage. We briefly review our research into providing efficient storage servers capable of providing quality of service (QoS) contractsmore » relevant for big data use cases.« less
  • Data bases are widely available and knowledge-engineering systems have been developed for many practical uses. Not only numerical analysis but also logical calculus can be handled, and these tools are powerful for materials synthesis design and automatic spectral analysis. There still remain serious problems such as knowledge extraction, data structure, recognition level of entities, critical evaluation of data, and consistency of data as well as knowledge. A flexible representation scheme for handling complex data and knowledge such as material information in a single framework is necessary, and EDTM is one such example. Applications are being developed extensively for alloy, glassmore » and polymer design based on the corresponding data and knowledge bases. Descriptive items and data formats should be standardized or at least should be known by the producers of data bases and knowledge bases. Another challenge is directed at handling patent information, with importance from an industrial point of view as well as a theoretical one.« less
  • Today's large-scale scientific simulations produce data sets tens to hundreds of terabytes in size. The DataFoundry project is developing querying and analysis tools for these data sets. The Approximate Ad-Hoc Query Engine for Simulation Data (AQSIM) uses a multi-resolution, tree-shaped data structure that allows users to place runtime limits on queries over scientific simulation data. In this AQSIM data hierarchy, each node in the tree contains an abstract model describing all of the information contained in the subtree below that node. AQSIM is able to create the data hierarchy in a single pass. However, the nodes in the hierarchy frequentlymore » have low node fanout, which leads to inefficient I/O behavior during query processing. Low node fanout is a common problem in tree-shaped indices. This paper presents a set of one-pass tree ''pruning'' algorithms that efficiently restructure the data hierarchy by removing inner nodes, thereby increasing node fanout. As our experimental results show, the best approach is a combination of two algorithms, one that focuses on increasing node fanout and one that attempts to reduce the maximum tree height.« less