A data ecosystem to support machine learning in materials science
Abstract
Facilitating the application of machine learning to materials science problems requires enhancing the data ecosystem to enable discovery and collection of data from many sources, automated dissemination of new data across the ecosystem, and the connecting of data with materialsspecific machine learning models. Here, we present two projects, the Materials Data Facility (MDF) and the Data and Learning Hub for Science (DLHub), that address these needs. We use examples to show how MDF and DLHub capabilities can be leveraged to link data with machine learning models and how users can access those capabilities through web and programmatic interfaces.
- Authors:
-
- Univ. of Chicago, IL (United States); Argonne National Lab. (ANL), Argonne, IL (United States)
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Univ. of Chicago, IL (United States)
- Cornell Univ., Ithaca, NY (United States)
- Publication Date:
- Research Org.:
- Argonne National Laboratory (ANL), Argonne, IL (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC); National Inst. of Standards and Technology (NIST), Boulder, CO (United States)
- OSTI Identifier:
- 1607645
- Grant/Contract Number:
- AC02-06CH11357
- Resource Type:
- Accepted Manuscript
- Journal Name:
- MRS Communications
- Additional Journal Information:
- Journal Volume: 9; Journal Issue: 4; Journal ID: ISSN 2159-6859
- Publisher:
- Materials Research Society - Cambridge University Press
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 36 MATERIALS SCIENCE
Citation Formats
Blaiszik, Ben, Ward, Logan, Schwarting, Marcus, Gaff, Jonathon, Chard, Ryan, Pike, Daniel, Chard, Kyle, and Foster, Ian. A data ecosystem to support machine learning in materials science. United States: N. p., 2019.
Web. doi:10.1557/mrc.2019.118.
Blaiszik, Ben, Ward, Logan, Schwarting, Marcus, Gaff, Jonathon, Chard, Ryan, Pike, Daniel, Chard, Kyle, & Foster, Ian. A data ecosystem to support machine learning in materials science. United States. https://doi.org/10.1557/mrc.2019.118
Blaiszik, Ben, Ward, Logan, Schwarting, Marcus, Gaff, Jonathon, Chard, Ryan, Pike, Daniel, Chard, Kyle, and Foster, Ian. Thu .
"A data ecosystem to support machine learning in materials science". United States. https://doi.org/10.1557/mrc.2019.118. https://www.osti.gov/servlets/purl/1607645.
@article{osti_1607645,
title = {A data ecosystem to support machine learning in materials science},
author = {Blaiszik, Ben and Ward, Logan and Schwarting, Marcus and Gaff, Jonathon and Chard, Ryan and Pike, Daniel and Chard, Kyle and Foster, Ian},
abstractNote = {Facilitating the application of machine learning to materials science problems requires enhancing the data ecosystem to enable discovery and collection of data from many sources, automated dissemination of new data across the ecosystem, and the connecting of data with materialsspecific machine learning models. Here, we present two projects, the Materials Data Facility (MDF) and the Data and Learning Hub for Science (DLHub), that address these needs. We use examples to show how MDF and DLHub capabilities can be leveraged to link data with machine learning models and how users can access those capabilities through web and programmatic interfaces.},
doi = {10.1557/mrc.2019.118},
journal = {MRS Communications},
number = 4,
volume = 9,
place = {United States},
year = {Thu Oct 10 00:00:00 EDT 2019},
month = {Thu Oct 10 00:00:00 EDT 2019}
}
Web of Science
Figures / Tables:
Works referenced in this record:
The Discovery Cloud: Accelerating and Democratizing Research on a Global Scale
conference, April 2016
- Foster, Ian; Chard, Kyle; Tuecke, Steven
- 2016 IEEE International Conference on Cloud Engineering (IC2E)
AFLOWLIB.ORG: A distributed materials properties repository from high-throughput ab initio calculations
journal, June 2012
- Curtarolo, Stefano; Setyawan, Wahyu; Wang, Shidong
- Computational Materials Science, Vol. 58
Materials Data Infrastructure: A Case Study of the Citrination Platform to Examine Data Import, Storage, and Access
journal, June 2016
- O’Mara, Jordan; Meredig, Bryce; Michel, Kyle
- JOM, Vol. 68, Issue 8
Globus Platform Services for Data Publication
conference, January 2018
- Ananthakrishnan, Rachana; Blaiszik, Ben; Chard, Kyle
- Proceedings of the Practice and Experience on Advanced Research Computing - PEARC '18
The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies
journal, December 2015
- Kirklin, Scott; Saal, James E.; Meredig, Bryce
- npj Computational Materials, Vol. 1, Issue 1
SchNet – A deep learning architecture for molecules and materials
journal, June 2018
- Schütt, K. T.; Sauceda, H. E.; Kindermans, P. -J.
- The Journal of Chemical Physics, Vol. 148, Issue 24
Colorimetric Screening for High-Throughput Discovery of Light Absorbers
journal, January 2015
- Mitrovic, Slobodan; Soedarmadji, Edwin; Newhouse, Paul F.
- ACS Combinatorial Science, Vol. 17, Issue 3
Real-time coherent diffraction inversion using deep generative networks
journal, November 2018
- Cherukara, Mathew J.; Nashed, Youssef S. G.; Harder, Ross J.
- Scientific Reports, Vol. 8, Issue 1
Machine learning prediction of accurate atomization energies of organic molecules from low-fidelity quantum chemical calculations
journal, August 2019
- Ward, Logan; Blaiszik, Ben; Foster, Ian
- MRS Communications, Vol. 9, Issue 3
Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach
journal, April 2015
- Ramakrishnan, Raghunathan; Dral, Pavlo O.; Rupp, Matthias
- Journal of Chemical Theory and Computation, Vol. 11, Issue 5
4CeeD: Real-Time Data Acquisition and Analysis Framework for Material-Related Cyber-Physical Environments
conference, May 2017
- Nguyen, Phuong; Chan, Michael; Mchenry, Kenton
- 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)
NOMAD: The FAIR concept for big data-driven materials science
journal, September 2018
- Draxl, Claudia; Scheffler, Matthias
- MRS Bulletin, Vol. 43, Issue 9
Matminer: An open source toolkit for materials data mining
journal, September 2018
- Ward, Logan; Dunn, Alexander; Faghaninia, Alireza
- Computational Materials Science, Vol. 152
Towards a Hybrid Human-Computer Scientific Information Extraction Pipeline
conference, October 2017
- Tchoua, Roselyne B.; Chard, Kyle; Audus, Debra J.
- 2017 IEEE 13th International Conference on e-Science (e-Science)
Informatics Infrastructure for the Materials Genome Initiative
journal, July 2016
- Dima, Alden; Bhaskarla, Sunil; Becker, Chandler
- JOM, Vol. 68, Issue 8
The Materials Data Facility: Data Services to Advance Materials Science Research
journal, July 2016
- Blaiszik, B.; Chard, K.; Pruyne, J.
- JOM, Vol. 68, Issue 8
Commentary: The Materials Project: A materials genome approach to accelerating materials innovation
journal, July 2013
- Jain, Anubhav; Ong, Shyue Ping; Hautier, Geoffroy
- APL Materials, Vol. 1, Issue 1
The Materials Genome Initiative: One year on
journal, August 2012
- White, Ashley
- MRS Bulletin, Vol. 37, Issue 8
The Materials Commons: A Collaboration Platform and Information Repository for the Global Materials Community
journal, July 2016
- Puchala, Brian; Tarcea, Glenn; Marquis, Emmanuelle. A.
- JOM, Vol. 68, Issue 8
Gaussian-4 theory using reduced order perturbation theory
journal, September 2007
- Curtiss, Larry A.; Redfern, Paul C.; Raghavachari, Krishnan
- The Journal of Chemical Physics, Vol. 127, Issue 12
Introducing Parsl: A Python Parallel Scripting Library
text, January 2017
- Babuji, Yadu; Brizius, Alison; Chard, Kyle
- Zenodo
Automated algorithms for band gap analysis from optical absorption spectra
journal, December 2017
- Schwarting, Marcus; Siol, Sebastian; Talley, Kevin
- Materials Discovery, Vol. 10
Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis
journal, February 2013
- Ong, Shyue Ping; Richards, William Davidson; Jain, Anubhav
- Computational Materials Science, Vol. 68
Machine learning of optical properties of materials – predicting spectra from images and images from spectra
journal, January 2019
- Stein, Helge S.; Guevarra, Dan; Newhouse, Paul F.
- Chemical Science, Vol. 10, Issue 1
Introducing Parsl: A Python Parallel Scripting Library
text, January 2017
- Babuji, Yadu; Brizius, Alison; Chard, Kyle
- Zenodo
Introducing Parsl: A Python Parallel Scripting Library
text, January 2017
- Babuji, Yadu; Brizius, Alison; Chard, Kyle
- Zenodo
Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach
text, January 2015
- Ramakrishnan, Raghunathan; Dral, Pavlo O.; Rupp, Matthias
- American Chemical Society
The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies
text, January 2015
- Kirklin, Scott; Saal, James E.; Meredig, Bryce
- London : Nature Publ. Group
SchNet - a deep learning architecture for molecules and materials
text, January 2017
- Schütt, Kristof T.; Sauceda, Huziel E.; Kindermans, Pieter-Jan
- arXiv
Machine Learning Prediction of Accurate Atomization Energies of Organic Molecules from Low-Fidelity Quantum Chemical Calculations
text, January 2019
- Ward, Logan; Blaiszik, Ben; Foster, Ian
- arXiv
Works referencing / citing this record:
Dredging a data lake: decentralized metadata extraction
conference, December 2019
- Skluzacek, Tyler J.
- Middleware '19: 20th International Middleware Conference, Proceedings of the 20th International Middleware Conference Doctoral Symposium
Biofilm Rupture by Laser-Induced Stress Waves Increases with Loading Amplitude, Independent of Location
journal, February 2020
- Kearns, Kaitlyn L.; Boyd, James D.; Grady, Martha E.
- ACS Applied Bio Materials, Vol. 3, Issue 3
A Cloud-Based Framework for Machine Learning Workloads and Applications
journal, January 2020
- Lopez Garcia, Alvaro; De Lucas, Jesus Marco; Antonacci, Marica
- IEEE Access, Vol. 8
Figures / Tables found in this record: