DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A data ecosystem to support machine learning in materials science

Abstract

Facilitating the application of machine learning to materials science problems requires enhancing the data ecosystem to enable discovery and collection of data from many sources, automated dissemination of new data across the ecosystem, and the connecting of data with materialsspecific machine learning models. Here, we present two projects, the Materials Data Facility (MDF) and the Data and Learning Hub for Science (DLHub), that address these needs. We use examples to show how MDF and DLHub capabilities can be leveraged to link data with machine learning models and how users can access those capabilities through web and programmatic interfaces.

Authors:
ORCiD logo [1]; ORCiD logo [1];  [2];  [3];  [1];  [4];  [1]; ORCiD logo [1]
  1. Univ. of Chicago, IL (United States); Argonne National Lab. (ANL), Argonne, IL (United States)
  2. Argonne National Lab. (ANL), Argonne, IL (United States)
  3. Univ. of Chicago, IL (United States)
  4. Cornell Univ., Ithaca, NY (United States)
Publication Date:
Research Org.:
Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC); National Inst. of Standards and Technology (NIST), Boulder, CO (United States)
OSTI Identifier:
1607645
Grant/Contract Number:  
AC02-06CH11357
Resource Type:
Accepted Manuscript
Journal Name:
MRS Communications
Additional Journal Information:
Journal Volume: 9; Journal Issue: 4; Journal ID: ISSN 2159-6859
Publisher:
Materials Research Society - Cambridge University Press
Country of Publication:
United States
Language:
English
Subject:
36 MATERIALS SCIENCE

Citation Formats

Blaiszik, Ben, Ward, Logan, Schwarting, Marcus, Gaff, Jonathon, Chard, Ryan, Pike, Daniel, Chard, Kyle, and Foster, Ian. A data ecosystem to support machine learning in materials science. United States: N. p., 2019. Web. doi:10.1557/mrc.2019.118.
Blaiszik, Ben, Ward, Logan, Schwarting, Marcus, Gaff, Jonathon, Chard, Ryan, Pike, Daniel, Chard, Kyle, & Foster, Ian. A data ecosystem to support machine learning in materials science. United States. https://doi.org/10.1557/mrc.2019.118
Blaiszik, Ben, Ward, Logan, Schwarting, Marcus, Gaff, Jonathon, Chard, Ryan, Pike, Daniel, Chard, Kyle, and Foster, Ian. Thu . "A data ecosystem to support machine learning in materials science". United States. https://doi.org/10.1557/mrc.2019.118. https://www.osti.gov/servlets/purl/1607645.
@article{osti_1607645,
title = {A data ecosystem to support machine learning in materials science},
author = {Blaiszik, Ben and Ward, Logan and Schwarting, Marcus and Gaff, Jonathon and Chard, Ryan and Pike, Daniel and Chard, Kyle and Foster, Ian},
abstractNote = {Facilitating the application of machine learning to materials science problems requires enhancing the data ecosystem to enable discovery and collection of data from many sources, automated dissemination of new data across the ecosystem, and the connecting of data with materialsspecific machine learning models. Here, we present two projects, the Materials Data Facility (MDF) and the Data and Learning Hub for Science (DLHub), that address these needs. We use examples to show how MDF and DLHub capabilities can be leveraged to link data with machine learning models and how users can access those capabilities through web and programmatic interfaces.},
doi = {10.1557/mrc.2019.118},
journal = {MRS Communications},
number = 4,
volume = 9,
place = {United States},
year = {Thu Oct 10 00:00:00 EDT 2019},
month = {Thu Oct 10 00:00:00 EDT 2019}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 71 works
Citation information provided by
Web of Science

Figures / Tables:

Figure 1 Figure 1: Materials Data Facility (MDF) overview. (1) Users submit data to MDF by specifying the data’s location, title, authors, and more. (2) MDF Connect collects data from the specified location and applies materials-specific extractors and transformations to enrich the data. (3) Processed data and metadata are dispatched to anymore » supported community data service(s) specified by the user. Other users can then discover, interact with, and access the data using any of those services.« less

Save / Share:

Works referenced in this record:

The Discovery Cloud: Accelerating and Democratizing Research on a Global Scale
conference, April 2016

  • Foster, Ian; Chard, Kyle; Tuecke, Steven
  • 2016 IEEE International Conference on Cloud Engineering (IC2E)
  • DOI: 10.1109/IC2E.2016.46

AFLOWLIB.ORG: A distributed materials properties repository from high-throughput ab initio calculations
journal, June 2012


Materials Data Infrastructure: A Case Study of the Citrination Platform to Examine Data Import, Storage, and Access
journal, June 2016


Globus Platform Services for Data Publication
conference, January 2018

  • Ananthakrishnan, Rachana; Blaiszik, Ben; Chard, Kyle
  • Proceedings of the Practice and Experience on Advanced Research Computing - PEARC '18
  • DOI: 10.1145/3219104.3219127

The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies
journal, December 2015


SchNet – A deep learning architecture for molecules and materials
journal, June 2018

  • Schütt, K. T.; Sauceda, H. E.; Kindermans, P. -J.
  • The Journal of Chemical Physics, Vol. 148, Issue 24
  • DOI: 10.1063/1.5019779

Colorimetric Screening for High-Throughput Discovery of Light Absorbers
journal, January 2015

  • Mitrovic, Slobodan; Soedarmadji, Edwin; Newhouse, Paul F.
  • ACS Combinatorial Science, Vol. 17, Issue 3
  • DOI: 10.1021/co500151u

Real-time coherent diffraction inversion using deep generative networks
journal, November 2018

  • Cherukara, Mathew J.; Nashed, Youssef S. G.; Harder, Ross J.
  • Scientific Reports, Vol. 8, Issue 1
  • DOI: 10.1038/s41598-018-34525-1

Machine learning prediction of accurate atomization energies of organic molecules from low-fidelity quantum chemical calculations
journal, August 2019

  • Ward, Logan; Blaiszik, Ben; Foster, Ian
  • MRS Communications, Vol. 9, Issue 3
  • DOI: 10.1557/mrc.2019.107

Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach
journal, April 2015

  • Ramakrishnan, Raghunathan; Dral, Pavlo O.; Rupp, Matthias
  • Journal of Chemical Theory and Computation, Vol. 11, Issue 5
  • DOI: 10.1021/acs.jctc.5b00099

4CeeD: Real-Time Data Acquisition and Analysis Framework for Material-Related Cyber-Physical Environments
conference, May 2017

  • Nguyen, Phuong; Chan, Michael; Mchenry, Kenton
  • 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)
  • DOI: 10.1109/CCGRID.2017.51

NOMAD: The FAIR concept for big data-driven materials science
journal, September 2018


Matminer: An open source toolkit for materials data mining
journal, September 2018


Towards a Hybrid Human-Computer Scientific Information Extraction Pipeline
conference, October 2017

  • Tchoua, Roselyne B.; Chard, Kyle; Audus, Debra J.
  • 2017 IEEE 13th International Conference on e-Science (e-Science)
  • DOI: 10.1109/eScience.2017.23

Informatics Infrastructure for the Materials Genome Initiative
journal, July 2016


The Materials Data Facility: Data Services to Advance Materials Science Research
journal, July 2016


Commentary: The Materials Project: A materials genome approach to accelerating materials innovation
journal, July 2013

  • Jain, Anubhav; Ong, Shyue Ping; Hautier, Geoffroy
  • APL Materials, Vol. 1, Issue 1
  • DOI: 10.1063/1.4812323

The Materials Genome Initiative: One year on
journal, August 2012


The Materials Commons: A Collaboration Platform and Information Repository for the Global Materials Community
journal, July 2016


Gaussian-4 theory using reduced order perturbation theory
journal, September 2007

  • Curtiss, Larry A.; Redfern, Paul C.; Raghavachari, Krishnan
  • The Journal of Chemical Physics, Vol. 127, Issue 12
  • DOI: 10.1063/1.2770701

Introducing Parsl: A Python Parallel Scripting Library
text, January 2017


Automated algorithms for band gap analysis from optical absorption spectra
journal, December 2017


Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis
journal, February 2013


Machine learning of optical properties of materials – predicting spectra from images and images from spectra
journal, January 2019

  • Stein, Helge S.; Guevarra, Dan; Newhouse, Paul F.
  • Chemical Science, Vol. 10, Issue 1
  • DOI: 10.1039/C8SC03077D

Introducing Parsl: A Python Parallel Scripting Library
text, January 2017


Introducing Parsl: A Python Parallel Scripting Library
text, January 2017


Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach
text, January 2015

  • Ramakrishnan, Raghunathan; Dral, Pavlo O.; Rupp, Matthias
  • American Chemical Society
  • DOI: 10.5451/unibas-ep43348

The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies
text, January 2015

  • Kirklin, Scott; Saal, James E.; Meredig, Bryce
  • London : Nature Publ. Group
  • DOI: 10.34657/7521

SchNet - a deep learning architecture for molecules and materials
text, January 2017


Works referencing / citing this record:

Dredging a data lake: decentralized metadata extraction
conference, December 2019

  • Skluzacek, Tyler J.
  • Middleware '19: 20th International Middleware Conference, Proceedings of the 20th International Middleware Conference Doctoral Symposium
  • DOI: 10.1145/3366624.3368170

Biofilm Rupture by Laser-Induced Stress Waves Increases with Loading Amplitude, Independent of Location
journal, February 2020

  • Kearns, Kaitlyn L.; Boyd, James D.; Grady, Martha E.
  • ACS Applied Bio Materials, Vol. 3, Issue 3
  • DOI: 10.1021/acsabm.9b01085

A Cloud-Based Framework for Machine Learning Workloads and Applications
journal, January 2020


Figures/Tables have been extracted from DOE-funded journal article accepted manuscripts.