Aggregation and Structuring of Materials and Chemicals Data from Diverse Sources

Tassone, Christopher; Mehta, Apurva

doi:10.2172/1630122

Title: Aggregation and Structuring of Materials and Chemicals Data from Diverse Sources

Full Record
Other Related Research

Abstract

The overarching goal of this work was to demonstrate how data driven materials discovery which utilizes artificial intelligence can be utilized by both the industrial and academic sectors. Our objectives were four fold. First, to build the necessary tools, both experimental and computational, to demonstrate how data driven methods accelerate materials discovery in challenging design spaces for technologically important materials for which there does not exist a theoretical framework relating processing, structure, and properties. Second, to apply these tools to make break-through discoveries in two materials classes, namely wear resistant alloys and highly selective catalysts. Third, to develop the guiding principles and provide the foundational software frameworks which can be utilized with minimal effort in novel application areas outside the scope of the proposed work. Fourth, to demonstrate how to translate machine learning models, which have previously been regarded as ‘black boxes’, into a human readable new physio-chemical insights. To this end we have used high-throughput experimentation coupled to data-driven feedback to investigate synthesis routes for two complex material systems that exhibit immediate industrial potential but are challenged by engineering bottlenecks: wear resistant metallic glasses and crystalline nanoparticle catalysts for hydrocarbon conversion. In both of these systems, current theoretical frameworksmore »« less

Authors:

^[1];

^[1]

SLAC National Accelerator Lab., Menlo Park, CA (United States). Stanford Synchrotron Radiation Lightsource (SSRL)

Publication Date:: Tue Dec 31 00:00:00 EST 2019

Research Org.:: SLAC National Accelerator Lab., Menlo Park, CA (United States)

Sponsoring Org.:: USDOE Office of Science (SC)

Contributing Org.:: Citrine Informatics, Inc., Redwood City, CA (United States)

OSTI Identifier:: 1630122

Report Number(s):: SLAC-R-1140
FWP 100250;; TRN: US2106547

DOE Contract Number:: AC02-76SF00515

Resource Type:: Technical Report

Country of Publication:: United States

Language:: English

Subject:: 36 MATERIALS SCIENCE; 97 MATHEMATICS AND COMPUTING

Citation Formats


                    Tassone, Christopher, and Mehta, Apurva. Aggregation and Structuring of Materials and Chemicals Data from Diverse Sources.  United States: N. p., 2019. 
        Web.  doi:10.2172/1630122.

Copy to clipboard


                    Tassone, Christopher, & Mehta, Apurva. Aggregation and Structuring of Materials and Chemicals Data from Diverse Sources.  United States.  https://doi.org/10.2172/1630122

Copy to clipboard


                    Tassone, Christopher, and Mehta, Apurva. 2019.  
        "Aggregation and Structuring of Materials and Chemicals Data from Diverse Sources".  United States.  https://doi.org/10.2172/1630122.  https://www.osti.gov/servlets/purl/1630122.

Copy to clipboard


                    
@article{osti_1630122,

  title        = {Aggregation and Structuring of Materials and Chemicals Data from Diverse Sources},

  author       = {Tassone, Christopher and Mehta, Apurva},

  abstractNote = {The overarching goal of this work was to demonstrate how data driven materials discovery which utilizes artificial intelligence can be utilized by both the industrial and academic sectors. Our objectives were four fold. First, to build the necessary tools, both experimental and computational, to demonstrate how data driven methods accelerate materials discovery in challenging design spaces for technologically important materials for which there does not exist a theoretical framework relating processing, structure, and properties. Second, to apply these tools to make break-through discoveries in two materials classes, namely wear resistant alloys and highly selective catalysts. Third, to develop the guiding principles and provide the foundational software frameworks which can be utilized with minimal effort in novel application areas outside the scope of the proposed work. Fourth, to demonstrate how to translate machine learning models, which have previously been regarded as ‘black boxes’, into a human readable new physio-chemical insights. To this end we have used high-throughput experimentation coupled to data-driven feedback to investigate synthesis routes for two complex material systems that exhibit immediate industrial potential but are challenged by engineering bottlenecks: wear resistant metallic glasses and crystalline nanoparticle catalysts for hydrocarbon conversion. In both of these systems, current theoretical frameworks have fallen short of providing sufficiently robust predictions for candidate materials, and the synthetic routes to creating them. By leveraging massively scalable data-driven models to inform a cycle of synthesis, measurement, and model building, we have compressed the timeline for breakthroughs in these systems by at least order of magnitude. Taking advantage of the synchrotron-based measurement capabilities at SSRL and the massively scalable data-driven modeling platform of Citrine Informatics (Citrination), we performed highthroughput experimentation which is informed by adaptive machine learning models (i.e sequential learning) which are scalable to from tens to tens of thousands of related experiments. This approach augments traditional combinatorial experiments, by including a sequential learning algorithm in the loop to search large design spaces as efficiently as possible, and informed by all prior experiments performed. The common design loop which is performed when these sequential learning algorithms are incorporated into traditional high throughput combinatorial experiments is as follows. A target set of material properties or structures is specified. A traditional combinatorial experiment is performed to generate an initial data set. A machine learning algorithm is trained on this initial data set. The machine learning algorithm is then queried for candidate materials which will either be likely to hit the specified target (i.e exploit), or be likely to significantly improve the predictive power of the algorithm (i.e explore). New experiments are performed based on these predictions, and that data is contributed back to the model, which is re-queried for the next set of candidate materials. This loop has several advantages of traditional design of experiments coupled to high throughput combinatorial methods but the primary advantage is that the sequential learning algorithm guides successive experiments using all prior data to efficiently search the potential design space by sampling only the most impactful regions. The result is significantly accelerated discovery. In the case of this work we demonstrated that in two years we were 7 able to double the number of metallic glasses discovered over decades of research, as well as reduce the time to discovery nanocrystal synthetic routes from 1 year to 3 days. This represents a significant increase in the efficiency of R&D in challenging design spaces. Additionally the use of these tools resulted in the discovery of a FeBNb alloy with the wear resistance a Young’s modulus of 404 hardened stainless steel, and hardness of silicon carbide, as well as a catalyst which converts propane to propylene with a 100% selectivity, and negligible degradation in activity after 20 hours on stream. The catalysts discovered are significant as the propane dehydrogenation reaction has been identified as a potential target reaction which can see a 20% reduction in energy inputs if separation is less energy intensive. The high selectivity for the intended product, accompanied by less frequent catalyst regeneration cycles has the potential to result in significant energy savings. The demonstration of the use of sequential learning data driven discovery required the development of algorithms and software. This included both the continued development of the Citrination platform, as well as the development of open source freely available software. The Citrination platform was augments to include new data sets which were integrated, as well as develop new workflows for the uptake of x-ray scattering data. Additionally, the Citrination API was further developed to be able to interact with using a python interface. In order to facilitate the demonstration of autonomous closed loop synthetic discovery two software packages were developed. Xrsdkit (https://github.com/scattering-central/xrsdkit) uses a set of machine learning algorithms to completely automate the interpretation of x-ray scattering data and is extensible to any x-ray scattering data. The platform for automated workflows by SSRL, paws (https://github.com/slaclab/paws), handles all the machine control for the closed loop sequential learning driven synthesis and includes a set of sequential learning algorithms for the design of experiment, as well as a client for communication with Citrination to replace paws built-in sequential learning algorithm with Citrine’s design tool. Together these pieces of software ensure that other DOE projects, or industry researchers which would benefit from the utilization of sequential learning or automated materials discovery can do so with minimal effort on the part of the project performers. These tools will function out of the box for a wide array of applications in which the discovery of higher performing materials is the current bottle neck. Additionally the Citrination platform has seen a broad adoption by industry users over the course of the project period, with a demonstrated track record of significantly compressing R&D time tables and reducing associated costs. The tools and results of this project provide a powerful demonstration of how machine learning can be used to significantly accelerate R&D cycles for the discovery of high performing materials. In both cases the use of these tools resulted in either significantly increasing the rate of discovery, or significantly reducing the time to discovery. Furthermore, we demonstrated the ability to perform closed-loop automated discovery of complex materials which to our knowledge is the first example for beyond small molecule drug discovery. Given these successes we strongly recommend that future project which include materials discovery as part of the proposed work include the use of these methods in the research design. We have endeavored to build the foundational tools which are applicable across application spaces to ensure that these tools are accessible to researchers and industry alike.},

  doi          = {10.2172/1630122},

  url          = {https://www.osti.gov/biblio/1630122},
  journal      = {},
number       = ,

  volume       = ,

  place        = {United States},

  year         = {Tue Dec 31 00:00:00 EST 2019},

  month        = {Tue Dec 31 00:00:00 EST 2019}

}

Copy to clipboard

Technical Report:

View Technical Report (2.60 MB)

https://doi.org/10.2172/1630122

Save / Share:

Export Metadata

Save to My Library

Similar records in OSTI.GOV collections:

Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)

Technical Report Shen, Xipeng

The development of modern processors exhibits two trends that complicate the optimizations of modern software. The first is the increasing sensitivity of processors' throughput to irregularities in computation. With more processors produced through a massive integration of simple cores, future systems will increasingly favor regular data-level parallel computations, but deviate from the needs of applications with complex patterns. Some evidences are already shown on Graphic Processing Units (GPU): Irregular data accesses (e.g., indirect references A[D[i]]) and conditional branches are limiting many GPU applications' performance at a level an order of magnitude lower than the peak of GPU. The second hardwaremore »« less
https://doi.org/10.2172/1576175

Full Text Available
Lightweight and Statistical Techniques for Petascale PetaScale Debugging

Technical Report Miller, Barton

This project investigated novel techniques for debugging scientific applications on petascale architectures. In particular, we developed lightweight tools that narrow the problem space when bugs are encountered. We also developed techniques that either limit the number of tasks and the code regions to which a developer must apply a traditional debugger or that apply statistical techniques to provide direct suggestions of the location and type of error. We extend previous work on the Stack Trace Analysis Tool (STAT), that has already demonstrated scalability to over one hundred thousand MPI tasks. We also extended statistical techniques developed to isolate programming errorsmore »« less
https://doi.org/10.2172/1135799

Full Text Available
An Integrated Software Package for Studying Structure-Property-Processing Relationships in Material Systems

Technical Report Liu, Ziye

The identification of structure-property-processing relationships require dynamical models that can access multitude of length spanning nanometers to microns and timescales spanning picoseconds to seconds. Despite its widespread availability of a variety of open source and commercial codes as well as their usage in various flavors, the predictive power of molecular dynamics (MD) is severely limited. Ab-initio MD (AIMD), although very accurate, is extremely challenging to scale beyond a few 100 atoms (< few nm) and tens of picoseconds. Classical MD, on the other hand, is often limited by the accuracy of the inter-atomic potentials and is restricted to nanometer lengthmore »« less
NEXTorch: A Design and Bayesian Optimization Toolkit for Chemical Sciences and Engineering

Journal Article Wang, Yifan; Chen, Tai-Ying; Vlachos, Dionisios - Journal of Chemical Information and Modeling

Automation and optimization of chemical systems require well-informed decisions on what experiments to run to reduce time, materials, and/or computations. Data-driven active learning algorithms have emerged as valuable tools to solve such tasks. Bayesian optimization, a sequential global optimization approach, is a popular active-learning framework. Past studies have demonstrated its efficiency in solving chemistry and engineering problems. Here we introduce NEXTorch, a library in Python/PyTorch, to facilitate laboratory or computational design using Bayesian optimization. NEXTorch offers fast predictive modeling, flexible optimization loops, visualization capabilities, easy interfacing with legacy software, and multiple types of parameters and data type conversions. It providesmore »« less
https://doi.org/10.1021/acs.jcim.1c00637

Full Text Available
Machine Learning Applied to Zeolite Synthesis: The Missing Link for Realizing High-Throughput Discovery

Journal Article Moliner, Manuel; Román-Leshkov, Yuriy; Corma, Avelino - Accounts of Chemical Research

Zeolites are microporous crystalline materials with well-defined cavities and pores, which can be prepared under different pore topologies and chemical compositions. Their preparation is typically defined by multiple interconnected variables (e.g., reagent sources, molar ratios, aging treatments, reaction time and temperature, among others), but unfortunately their distinctive influence, particularly on the nucleation and crystallization processes, is still far from being understood. Thus, the discovery and/or optimization of specific zeolites is closely related to the exploration of the parametric space through trial-and-error methods, generally by studying the influence of each parameter individually. In the past decade, machine learning (ML) methods havemore »« less
https://doi.org/10.1021/acs.accounts.9b00399

Full Text Available

Similar Records