Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Machine Learning for Materials Scientists: An Introductory Guide toward Best Practices

Journal Article · · Chemistry of Materials
This Methods/Protocols article is intended for materials scientists interested in performing machine learning-centered research. Herein, we cover broad guidelines and best practices regarding the obtaining and treatment of data, feature engineering, model training, validation, evaluation and comparison, popular repositories for materials data and benchmarking data sets, model and architecture sharing, and finally publication. In addition, we include interactive Jupyter notebooks with example Python code to demonstrate some of the concepts, workflows, and best practices discussed. Overall, the data-driven methods and machine learning workflows and considerations are presented in a simple way, allowing interested readers to more intelligently guide their machine learning research using the suggested references, best practices, and their own materials domain expertise.
Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
National Science Foundation (NSF); USDOE Laboratory Directed Research and Development (LDRD) Program; USDOE Office of Science (SC), Basic Energy Sciences (BES); Welch Foundation
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1766496
Journal Information:
Chemistry of Materials, Journal Name: Chemistry of Materials Journal Issue: 12 Vol. 32; ISSN 0897-4756
Publisher:
American Chemical Society (ACS)Copyright Statement
Country of Publication:
United States
Language:
English

References (99)

Benchmark AFLOW Data Sets for Machine Learning dataset January 2020
Machine Learning in Materials Science book January 2016
Machine Learning for Computational Heterogeneous Catalysis journal June 2019
Prediction of seebeck coefficient for compounds without restriction to fixed stoichiometry: A machine learning approach journal September 2017
A Perspective on Materials Informatics: State-of-the-Art and Challenges book December 2015
Materials informatics journal January 2018
Materials Design and Discovery with High-Throughput Density Functional Theory: The Open Quantum Materials Database (OQMD) journal September 2013
Prediction of the Composition and Hardness of High-Entropy Alloys by Machine Learning journal July 2019
Machine Learning Prediction of Heat Capacity for Solid Inorganics journal May 2018
Benchmark AFLOW Data Sets for Machine Learning journal May 2020
A machine learning approach for engineering bulk metallic glass alloys journal October 2018
Materials informatics: From the atomic-level to the continuum journal April 2019
Machine learning assisted design of high entropy alloys with desired property journal May 2019
AFLOWLIB.ORG: A distributed materials properties repository from high-throughput ab initio calculations journal June 2012
Informatics-aided bandgap engineering for solar materials journal February 2014
Predicting the thermodynamic stability of perovskite oxides using machine learning models journal July 2018
Matminer: An open source toolkit for materials data mining journal September 2018
A data-driven statistical model for predicting the critical temperature of a superconductor journal November 2018
Accelerating materials science with high-throughput computations and machine learning journal April 2019
Can machine learning find extraordinary materials? journal March 2020
Materials discovery and design using machine learning journal September 2017
Computational design of advanced steels journal January 2014
Data mining our way to the next generation of thermoelectrics journal January 2016
The Powder Diffraction File: a quality materials characterization database journal November 2019
High-Throughput Machine-Learning-Driven Synthesis of Full-Heusler Compounds journal October 2016
Classifying Crystal Structures of Binary Compounds AB through Cluster Resolution Feature Selection and Support Vector Machine Analysis journal September 2016
Predicting the Thermodynamic Stability of Solids Combining Density Functional Theory and Machine Learning journal May 2017
Predicting the Mechanical Properties of Zeolite Frameworks by Machine Learning journal August 2017
Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning journal October 2017
Machine Learning and Energy Minimization Approaches for Crystal Structure Predictions: A Review and New Horizons journal May 2018
Machine-Learning-Assisted Accurate Band Gap Predictions of Functionalized MXene journal May 2018
Machine Learning-Assisted Discovery of Solid Li-Ion Conducting Materials journal November 2018
Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals journal April 2019
Data-Driven Discovery of Photoactive Quaternary Oxides Using First-Principles Machine Learning journal August 2019
Five High-Impact Research Areas in Machine Learning for Materials Science journal December 2019
Predicting the Band Gaps of Inorganic Solids by Machine Learning journal March 2018
Machine Learning for Accelerated Discovery of Solar Photocatalysts journal October 2019
Machine Learning Enabled Computational Screening of Inorganic Solid Electrolytes for Suppression of Dendrite Formation in Lithium Metal Anodes journal August 2018
Information-Theoretic Approach for the Discovery of Design Rules for Crystal Chemistry journal June 2012
Finding Nature’s Missing Ternary Oxide Compounds Using Machine Learning and Density Functional Theory journal June 2010
Data-Driven Review of Thermoelectric Materials: Performance and Resource Considerations journal May 2013
Machine Learning Directed Search for Ultraincompressible, Superhard Materials journal July 2018
Publish your computer code: it is good enough journal October 2010
Computationally guided discovery of thermoelectric materials journal August 2017
Learning from data to design functional materials without inversion symmetry journal February 2017
To address surface reaction network complexity using scaling relations machine learning and DFT calculations journal March 2017
Universal fragment descriptors for predicting properties of inorganic crystals journal June 2017
Predicting crystal structure by merging data mining with quantum mechanics journal July 2006
A general-purpose machine learning framework for predicting properties of inorganic materials journal August 2016
Robust and synthesizable photocatalysts for CO2 reduction: a data-driven materials discovery journal January 2019
Machine learning modeling of superconducting critical temperature journal June 2018
Recent advances and applications of machine learning in solid-state materials science journal August 2019
Author Correction: Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm journal October 2020
Accelerating the discovery of materials for clean energy in the era of smart automation journal April 2018
Machine learning for molecular and materials science journal July 2018
Unsupervised word embeddings capture latent knowledge from materials science literature journal July 2019
AiiDA 1.0, a scalable computational infrastructure for automated reproducible workflows and data provenance journal September 2020
ElemNet: Deep Learning the Chemistry of Materials From Only Elemental Composition journal December 2018
Pattern Learning Electronic Density of States journal April 2019
Machine learning in catalysis journal April 2018
Machine learning bandgaps of double perovskites journal January 2016
A Statistical Learning Framework for Materials Science: Application to Elastic Moduli of k-nary Inorganic Polycrystalline Compounds journal October 2016
Holistic computational structure screening of more than 12 000 candidates for solid lithium-ion conductor materials journal January 2017
Predictive analytics for crystalline materials: bulk modulus journal January 2016
Can machine learning identify the next high-temperature superconductor? Examining extrapolation performance for materials discovery journal January 2018
Machine learning for renewable energy materials journal January 2019
Commentary: The Materials Project: A materials genome approach to accelerating materials innovation journal July 2013
Perspective: Web-based machine learning models for real-time screening of thermoelectric materials properties journal May 2016
SchNet – A deep learning architecture for molecules and materials journal June 2018
The Inorganic Crystal Structure Database (ICSD)—Present and Future journal January 2004
Molecular modelling and machine learning for high-throughput screening of metal-organic frameworks for hydrogen storage journal April 2019
Uncovering structure-property relationships of materials by subgroup discovery journal January 2017
The NOMAD laboratory: from data sharing to artificial intelligence journal May 2019
Combinatorial screening for new materials in unconstrained composition space with machine learning journal March 2014
How to represent crystal structures for machine learning: Towards fast prediction of electronic properties journal May 2014
Predicting density functional theory total energies and enthalpies of formation of metal-nonmetal compounds by linear regression journal February 2016
Representation of compounds for machine-learning prediction of physical properties journal April 2017
Big Data of Materials Science: Critical Role of the Descriptor journal March 2015
Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties journal April 2018
Predicting Crystal Structures with Data Mining of Quantum Calculations journal September 2003
Matrix- and tensor-based recommender systems for the discovery of currently unknown inorganic compounds journal January 2018
Machine learning with force-field-inspired descriptors for materials: Fast screening and mapping energy landscape journal August 2018
Finding Unprecedentedly Low-Thermal-Conductivity Half-Heusler Semiconductors via High-Throughput Materials Modeling journal February 2014
Crystallography Open Database – an open-access collection of crystal structures journal May 2009
CRYSTMET: a database of the structures and powder patterns of metals and intermetallics journal May 2002
Classification of AB O 3 perovskite solids: a machine learning study journal September 2015
The Cambridge Structural Database
  • Groom, Colin R.; Bruno, Ian J.; Lightfoot, Matthew P.
  • Acta Crystallographica Section B Structural Science, Crystal Engineering and Materials, Vol. 72, Issue 2, p. 171-179 https://doi.org/10.1107/S2052520616003954
journal April 2016
Computational Prediction of Critical Temperatures of Superconductors Based on Convolutional Gradient Boosting Decision Trees journal January 2020
Accelerated discovery of metallic glasses through iteration of machine learning and high-throughput experiments journal April 2018
Combining theory and experiment in electrocatalysis: Insights into materials design journal January 2017
Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science
  • Olson, Randal S.; Bartley, Nathan; Urbanowicz, Ryan J.
  • GECCO '16: Genetic and Evolutionary Computation Conference, Proceedings of the Genetic and Evolutionary Computation Conference 2016 https://doi.org/10.1145/2908812.2908918
conference July 2016
IRNet: A General Purpose Deep Residual Regression Framework for Materials Discovery
  • Jha, Dipendra; Ward, Logan; Yang, Zijiang
  • KDD '19: The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining https://doi.org/10.1145/3292500.3330703
conference July 2019
Exploration of data science techniques to predict fatigue strength of steel from composition and processing parameters journal April 2014
The Extent and Consequences of P-Hacking in Science journal March 2015
Artificial intelligence for materials discovery journal July 2019
Standardization and Its Effects on K-Means Clustering Algorithm journal September 2013
UMAP: Uniform Manifold Approximation and Projection journal September 2018
Benchmark AFLOW Data Sets for Machine Learning dataset January 2020
An acceleration search method of higher T c superconductors by a machine learning algorithm journal June 2019

Similar Records

BLDAP Intro to Python/Data Science Curriculum v1
Software · Sun Oct 26 20:00:00 EDT 2025 · OSTI ID:code-168675

Toward designing effective exascale scientific computing workflows: experiences and best practices
Conference · Sat Oct 01 00:00:00 EDT 2022 · OSTI ID:1928954

Data for: "Inorganic synthesis-structure maps in zeolites with machine learning..."
Dataset · Thu Aug 10 00:00:00 EDT 2023 · OSTI ID:2372801