Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Matminer: An open source toolkit for materials data mining

Journal Article · · Computational Materials Science
 [1];  [2];  [3];  [3];  [4];  [3];  [3];  [5];  [6];  [7];  [1];  [6];  [3];  [7];  [1];  [3]
  1. Univ. of Chicago, IL (United States); Argonne National Lab. (ANL), Argonne, IL (United States)
  2. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Berkeley, CA (United States)
  3. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  4. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Citrine Informatics, Redwood City, CA (United States)
  5. Univ. of Illinois, Urbana, IL (United States)
  6. Univ. of California, Berkeley, CA (United States)
  7. Northwestern Univ., Evanston, IL (United States)
As materials data sets grow in size and scope, the role of data mining and statistical learning methods to analyze these materials data sets and build predictive models is becoming more important. This manuscript introduces matminer, an open-source, Python-based software platform to facilitate data-driven methods of analyzing and predicting materials properties. Matminer provides modules for retrieving large data sets from external databases such as the Materials Project, Citrination, Materials Data Facility, and Materials Platform for Data Science. It also provides implementations for an extensive library of feature extraction routines developed by the materials community, with 47 featurization classes that can generate thousands of individual descriptors and combine them into mathematical functions. Finally, matminer provides a visualization module for producing interactive, shareable plots. These functions are designed in a way that integrates closely with machine learning and data analysis packages already developed and in use by the Python data science community. We explain the structure and logic of matminer, provide a description of its various modules, and showcase several examples of how matminer can be used to collect data, reproduce data mining studies reported in the literature, and test new methodologies.
Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC); USDOE Office of Science (SC), Basic Energy Sciences (BES)
Grant/Contract Number:
AC02-05CH11231; AC02-06CH11357
OSTI ID:
1532339
Alternate ID(s):
OSTI ID: 1693876
Journal Information:
Computational Materials Science, Journal Name: Computational Materials Science Journal Issue: C Vol. 152; ISSN 0927-0256
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English

References (71)

Accelerated Discovery of Large Electrostrains in BaTiO 3 -Based Piezoelectrics Using Active Learning journal January 2018
Die Berechnung optischer und elektrostatischer Gitterpotentiale journal January 1921
High-Throughput Method for the Impedance Spectroscopic Characterization of Resistive Gas Sensors journal January 2004
Crystal structure representations for machine learning models of formation energies journal April 2015
Data-Driven Model for Estimation of Friction Coefficient Via Informatics Methods journal May 2012
Materials Design and Discovery with High-Throughput Density Functional Theory: The Open Quantum Materials Database (OQMD) journal September 2013
Materials Data Infrastructure: A Case Study of the Citrination Platform to Examine Data Import, Storage, and Access journal June 2016
The Materials Data Facility: Data Services to Advance Materials Science Research journal July 2016
Formulation and calibration of higher-order elastic localization relationships using the MKS approach journal June 2011
High-throughput computational search for strengthening precipitates in alloys journal January 2016
A computational high-throughput search for new ternary superalloys journal January 2017
AFLOWLIB.ORG: A distributed materials properties repository from high-throughput ab initio calculations journal June 2012
Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis journal February 2013
The Materials Application Programming Interface (API): A simple, flexible and efficient API for materials data based on REpresentational State Transfer (REST) principles journal February 2015
Atomate: A high-level interface to generate, execute, and analyze computational materials science workflows journal November 2017
Atomistic calculations and materials informatics: A review journal June 2017
Amp: A modular approach to machine learning in atomistic simulations journal October 2016
Miedema Calculator: A thermodynamic platform for predicting formation enthalpies of alloys within framework of Miedema’s Theory journal December 2016
Prediction of high-entropy stabilized solid-solution in multi-component alloys journal February 2012
Data mining our way to the next generation of thermoelectrics journal January 2016
Accelerating the Design of Functional Glasses through Modeling journal June 2016
Statistical Analysis of Coordination Environments in Oxides journal September 2017
Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space journal June 2015
Carbonophosphates: A New Family of Cathode Materials for Li-Ion Batteries Identified Computationally journal May 2012
Data-Driven Review of Thermoelectric Materials: Performance and Resource Considerations journal May 2013
High-throughput computational design of cathode coatings for Li-ion batteries journal December 2016
Quantum-chemical insights from deep tensor neural networks journal January 2017
Universal fragment descriptors for predicting properties of inorganic crystals journal June 2017
A predictive structural model for bulk metallic glasses journal September 2015
The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies journal December 2015
A general-purpose machine learning framework for predicting properties of inorganic materials journal August 2016
Machine learning in materials informatics: recent applications and prospects journal December 2017
Data analytics and parallel-coordinate materials property charts journal January 2018
A database to enable discovery and design of piezoelectric materials journal September 2015
MoleculeNet: a benchmark for molecular machine learning journal January 2018
Branch-point energies and band discontinuities of III-nitrides and III-/II-oxides from quasiparticle band-structure calculations journal January 2009
Atom-centered symmetry functions for constructing high-dimensional neural network potentials journal February 2011
Commentary: The Materials Project: A materials genome approach to accelerating materials innovation journal July 2013
Research Update: The materials genome initiative: Data sharing and the impact of collaborative ab initio databases journal March 2016
Alchemical and structural distribution based representation for universal quantum machine learning journal June 2018
Machine learning reveals orbital interaction in materials journal January 2017
Glass formation in ternary transition metal alloys journal July 1990
Inhomogeneous Electron Gas journal November 1964
Local-density-functional calculations of the energy of atoms journal January 1997
How to represent crystal structures for machine learning: Towards fast prediction of electronic properties journal May 2014
Sparse representation for a potential energy surface journal July 2014
Kohn-Sham calculations with the exact functional journal July 2014
Learning scheme to predict atomic forces and accelerate materials simulations journal September 2015
Predicting density functional theory total energies and enthalpies of formation of metal-nonmetal compounds by linear regression journal February 2016
Benchmarking density functional perturbation theory to enable high-throughput screening of materials for dielectric constant and refractive index journal March 2016
Representation of compounds for machine-learning prediction of physical properties journal April 2017
Including crystal structure attributes in machine learning models of formation energies via Voronoi tessellations journal July 2017
Efficient Band Gap Prediction for Solids journal November 2010
Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning journal January 2012
Big Data of Materials Science: Critical Role of the Descriptor journal March 2015
Machine Learning Energies of 2 Million Elpasolite ( A B C 2 D 6 ) Crystals journal September 2016
Physical Content of the Exact Kohn-Sham Orbital Energies: Band Gaps and Derivative Discontinuities journal November 1983
Density-Functional Theory of the Energy Gap journal November 1983
Method for the computational comparison of crystal structures journal January 2005
IPython: A System for Interactive Scientific Computing journal January 2007
Matplotlib: A 2D Graphics Environment journal January 2007
Python for Scientists and Engineers journal March 2011
The NumPy Array: A Structure for Efficient Numerical Computation journal March 2011
Inorganic Materials Database for Exploring the Nature of Material journal November 2011
Prediction of Flatband Potentials at Semiconductor-Electrolyte Interfaces from Atomic Electronegativities journal January 1978
Beyond bulk single crystals: A data format for all materials structure–property–processing relationships journal August 2016
Materials science with large-scale data and informatics: Unlocking new opportunities journal May 2016
Assessing Local Structure Motifs Using Order Parameters for Motif Recognition, Interstitial Identification, and Diffusion Path Characterization journal November 2017
Computationally Efficient, Fully Coupled Multiscale Modeling of Materials Phenomena Using Calibrated Localization Linkages journal November 2012
Inorganic Materials Database for Exploring the Nature of Material journal November 2011
SymPy: symbolic computing in Python journal January 2017

Cited By (31)

Machine Learning-Based Prediction of Crystal Systems and Space Groups from Inorganic Materials Compositions journal February 2020
Enhancing materials property prediction by leveraging computational and experimental data using deep transfer learning journal November 2019
A transferable machine-learning framework linking interstice distribution and plastic heterogeneity in metallic glasses text January 2019
Materials Datasets with 273 compositional and structural features extracted from Matminer dataset January 2023
Feature Engineering of Solid‐State Crystalline Lattices for Machine Learning journal December 2019
Data‐Driven Materials Science: Status, Challenges, and Perspectives journal September 2019
Data‐Driven Materials Science: Status, Challenges, and Perspectives journal November 2019
A Critical Review of Machine Learning of Energy Materials journal January 2020
A set of Jupyter notebooks for the analysis of transport phenomena and reaction in porous catalyst pellet journal January 2019
Virtual Materials Intelligence for Design and Discovery of Advanced Electrocatalysts journal November 2019
A transferable machine-learning framework linking interstice distribution and plastic heterogeneity in metallic glasses journal December 2019
Recent advances and applications of machine learning in solid-state materials science journal August 2019
Comment on “A simple constrained machine learning model for predicting high-pressure-hydrogen-compressor materials” by Hattrick-Simpers, et al. , Molecular Systems Design & Engineering , 2018, 3 , 509 journal January 2020
Predicting structure/property relationships in multi-dimensional nanoparticle data using t-distributed stochastic neighbour embedding and machine learning journal January 2019
Nanoinformatics, and the big challenges for the science of small things journal January 2019
Local structure order parameters and site fingerprints for quantification of coordination environment and crystal structure similarity journal January 2020
Inertial effective mass as an effective descriptor for thermoelectrics via data-driven evaluation journal January 2019
From DFT to machine learning: recent approaches to materials science–a review journal May 2019
Rocketsled: a software library for optimizing high-throughput computational searches journal April 2019
Visualising multi-dimensional structure/property relationships with machine learning journal April 2019
Machine learning for parameter auto-tuning in molecular dynamics simulations: Efficient dynamics of ions near polarizable nanoparticles journal January 2020
Machine Learning for Parameter Auto-tuning in Molecular Dynamics Simulations: Efficient Dynamics of Ions near Polarizable Nanoparticles text January 2019
A data ecosystem to support machine learning in materials science journal October 2019
Robocrystallographer: automated crystal structure text descriptions and analysis journal July 2019
Materials science in the artificial intelligence age: high-throughput library generation, machine learning, and a pathway from correlations to the underpinning physics journal July 2019
Growing field of materials informatics: databases and artificial intelligence journal January 2020
Artificial intelligence for materials discovery journal July 2019
SMACT: Semiconducting Materials by Analogy and Chemical Theory journal June 2019
Computational Screening of New Perovskite Materials Using Transfer Learning and Deep Learning journal December 2019
Convolutional Neural Networks for Crystal Material Property Prediction Using Hybrid Orbital-Field Matrix and Magpie Descriptors journal April 2019
Data-driven materials science: status, challenges and perspectives text January 2019

Similar Records

BLDAP Intro to Python/Data Science Curriculum v1
Software · Sun Oct 26 20:00:00 EDT 2025 · OSTI ID:code-168675

Open Reproducible Electron Microscopy Data Analysis
Technical Report · Sun Mar 06 23:00:00 EST 2022 · OSTI ID:1847929

Related Subjects