Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

A View from ORNL: Scientific Data Research Opportunities in the Big Data Age

Conference ·
 [1];  [1];  [1];  [2];  [1];  [3];  [2];  [1];  [1];  [1];  [1];  [1];  [1];  [1];  [1];  [1];  [4];  [1];  [1];  [1] more »;  [1];  [1] « less
  1. ORNL
  2. Kitware
  3. Georgia Institute of Technology, Atlanta
  4. Rutgers University

One of the core issues across computer and computational science today is adapting to, managing, and learning from the influx of "Big Data". In the commercial space, this problem has led to a huge investment in new technologies and capabilities that are well adapted to dealing with the sorts of human-generated logs, videos, texts, and other large-data artifacts that are processed and resulted in an explosion of useful platforms and languages (Hadoop, Spark, Pandas, etc.). However, translating this work from the enterprise space to the computational science and HPC community has proven somewhat difficult, in part because of some of the fundamental differences in type and scale of data and timescales surrounding its generation and use. We describe a forward-looking research and development plan which centers around the concept of making Input/Output (I/O) intelligent for users in the scientific community, whether they are accessing scalable storage or performing in situ workflow tasks. Much of our work is based on our experience with the Adaptable I/O System (ADIOS 1.X), and our next generation version of the software ADIOS 2.X [1].

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1468120
Resource Relation:
Conference: IEEE 38th International Conference on Distributed Computing Systems (ICDCS) - Vienna, , Austria - 7/2/2018 4:00:00 AM-7/5/2018 4:00:00 AM
Country of Publication:
United States
Language:
English

References (39)

Optimizing checkpoint data placement with guaranteed burst buffer endurance in large-scale hierarchical storage systems February 2017
GoldRush: resource efficient in situ scientific data analytics using fine-grained interference aware execution January 2013
Topology Mapping for Blue Gene/L Supercomputer November 2006
The global version of the gyrokinetic turbulence code GENE August 2011
VTK-m: Accelerating the Visualization Toolkit for Massively Threaded Architectures May 2016
Pegasus: A Framework for Mapping Complex Scientific Workflows onto Distributed Systems January 2005
Visualization and Analysis Requirements for In Situ Processing for a Large-Scale Fusion Simulation Code November 2016
Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks: HELLO ADIOS August 2013
Handling Failures in Parallel Scientific Workflows Using Clouds
  • No authors listed
  • 2012 SC Companion: High-Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis https://doi.org/10.1109/SC.Companion.2012.28
November 2012
Extending Skel to Support the Development and Optimization of Next Generation I/O Systems September 2017
Gyrokinetic neoclassical study of the bootstrap current in the tokamak edge pedestal with fully non-linear Coulomb collisions April 2016
SODA: Science-Driven Orchestration of Data Analytics August 2015
Event-based systems: opportunities and challenges at exascale January 2009
Service Augmentation for High End Interactive Data Services September 2005
Landrush: Rethinking In-Situ Analysis for GPGPU Workflows May 2016
DataSpaces: an interaction and coordination framework for coupled simulation workflows January 2010
Global and local gyrokinetic simulations of high-performance discharges in view of ITER May 2013
Big data provenance: Challenges, state of the art and opportunities October 2015
Global adjoint tomography: first-generation model September 2016
Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System June 2017
Electron Temperature Gradient Turbulence December 2000
Exacution: Enhancing Scientific Data Management for Exascale June 2017
Moving the Code to the Data - Dynamic Code Deployment Using ActiveSpaces May 2011
TGE: Machine Learning Based Task Graph Embedding for Large-Scale Topology Mapping September 2017
Topology-aware task mapping for reducing communication contention on large parallel machines January 2006
Compressed ion temperature gradient turbulence in diverted tokamak edge May 2009
Machine Learning Predictions of Runtime and IO Traffic on High-End Clusters September 2016
Meteor: a middleware infrastructure for content‐based decoupled interactions in pervasive grid environments November 2007
GPUShare: Fair-Sharing Middleware for GPU Clouds May 2016
I/O performance challenges at leadership scale January 2009
Scientific workflow management and the Kepler system January 2006
Generic topology mapping strategies for large-scale parallel architectures January 2011
Data-intensive applications, challenges, techniques and technologies: A survey on Big Data August 2014
Computing Just What You Need: Online Data Analysis and Reduction at Extreme Scales December 2017
Performance Modeling of In Situ Rendering November 2016
24/7 Characterization of petascale I/O workloads August 2009
A Multiplatform Study of I/O Behavior on Petascale Supercomputers January 2015
SSD-optimized workload placement with adaptive learning and classification in HPC environments June 2014
Exascale Storage Systems the SIRIUS Way October 2016

Similar Records

Computing for Finance
Multimedia · 2010 · OSTI ID:1026097

Computing for Finance
Multimedia · 2010 · OSTI ID:1026101

Computing for Finance
Multimedia · 2010 · OSTI ID:1026091

Related Subjects