skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: BioCompoundML: A General Biofuel Property Screening Tool for Biological Molecules Using Random Forest Classifiers

Abstract

Screening a large number of biologically derived molecules for potential fuel compounds without recourse to experimental testing is important in identifying understudied yet valuable molecules. Experimental testing, although a valuable standard for measuring fuel properties, has several major limitations, including the requirement of testably high quantities, considerable expense, and a large amount of time. This paper discusses the development of a general-purpose fuel property tool, using machine learning, whose outcome is to screen molecules for desirable fuel properties. BioCompoundML adopts a general methodology, requiring as input only a list of training compounds (with identifiers and measured values) and a list of testing compounds (with identifiers). For the training data, BioCompoundML collects open data from the National Center for Biotechnology Information, incorporates user-provided features, imputes missing values, performs feature reduction, builds a classifier, and clusters compounds. BioCompoundML then collects data for the testing compounds, predicts class membership, and determines whether compounds are found in the range of variability of the training data set. We demonstrate this tool using three different fuel properties: research octane number (RON), threshold soot index (TSI), and melting point (MP). Here we provide measures of its success with these properties using randomized train/test measurements: average accuracy ismore » 88% in RON, 85% in TSI, and 94% in MP; average precision is 88% in RON, 88% in TSI, and 95% in MP; and average recall is 88% in RON, 82% in TSI, and 97% in MP. The receiver operator characteristics (area under the curve) were estimated at 0.88 in RON, 0.86 in TSI, and 0.87 in MP. We also measured the success of BioCompoundML by sending 16 compounds for direct RON determination. Finally, we provide a screen of 1977 hydrocarbons/oxygenates within the 8696 compounds in MetaCyc, identifying compounds with high predictive strength for high or low RON.« less

Authors:
 [1];  [2];  [3];  [1];  [4];  [1];  [1]
  1. Sandia National Lab. (SNL-CA), Livermore, CA (United States); Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States)
  2. Sandia National Lab. (SNL-CA), Livermore, CA (United States)
  3. National Renewable Energy Lab. (NREL), Golden, CO (United States)
  4. Sandia National Lab. (SNL-CA), Livermore, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Publication Date:
Research Org.:
National Renewable Energy Lab. (NREL), Golden, CO (United States); Sandia National Lab. (SNL-CA), Livermore, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Energy Efficiency and Renewable Energy (EERE), Vehicle Technologies Office (EE-3V); USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23)
OSTI Identifier:
1327434
Alternate Identifier(s):
OSTI ID: 1332664; OSTI ID: 1440943
Report Number(s):
NREL/JA-5400-67434
Journal ID: ISSN 0887-0624
Grant/Contract Number:
AC36-08GO28308; AC04-94AL85000; 347AC36-99GO10337; AC02- 05CH11231
Resource Type:
Journal Article: Published Article
Journal Name:
Energy and Fuels
Additional Journal Information:
Journal Volume: 30; Journal Issue: 10; Journal ID: ISSN 0887-0624
Publisher:
American Chemical Society (ACS)
Country of Publication:
United States
Language:
English
Subject:
09 BIOMASS FUELS; screening tool; fuel properties; BioCompoundML

Citation Formats

Whitmore, Leanne S., Davis, Ryan W., McCormick, Robert L., Gladden, John M., Simmons, Blake A., George, Anthe, and Hudson, Corey M. BioCompoundML: A General Biofuel Property Screening Tool for Biological Molecules Using Random Forest Classifiers. United States: N. p., 2016. Web. doi:10.1021/acs.energyfuels.6b01952.
Whitmore, Leanne S., Davis, Ryan W., McCormick, Robert L., Gladden, John M., Simmons, Blake A., George, Anthe, & Hudson, Corey M. BioCompoundML: A General Biofuel Property Screening Tool for Biological Molecules Using Random Forest Classifiers. United States. doi:10.1021/acs.energyfuels.6b01952.
Whitmore, Leanne S., Davis, Ryan W., McCormick, Robert L., Gladden, John M., Simmons, Blake A., George, Anthe, and Hudson, Corey M. Thu . "BioCompoundML: A General Biofuel Property Screening Tool for Biological Molecules Using Random Forest Classifiers". United States. doi:10.1021/acs.energyfuels.6b01952.
@article{osti_1327434,
title = {BioCompoundML: A General Biofuel Property Screening Tool for Biological Molecules Using Random Forest Classifiers},
author = {Whitmore, Leanne S. and Davis, Ryan W. and McCormick, Robert L. and Gladden, John M. and Simmons, Blake A. and George, Anthe and Hudson, Corey M.},
abstractNote = {Screening a large number of biologically derived molecules for potential fuel compounds without recourse to experimental testing is important in identifying understudied yet valuable molecules. Experimental testing, although a valuable standard for measuring fuel properties, has several major limitations, including the requirement of testably high quantities, considerable expense, and a large amount of time. This paper discusses the development of a general-purpose fuel property tool, using machine learning, whose outcome is to screen molecules for desirable fuel properties. BioCompoundML adopts a general methodology, requiring as input only a list of training compounds (with identifiers and measured values) and a list of testing compounds (with identifiers). For the training data, BioCompoundML collects open data from the National Center for Biotechnology Information, incorporates user-provided features, imputes missing values, performs feature reduction, builds a classifier, and clusters compounds. BioCompoundML then collects data for the testing compounds, predicts class membership, and determines whether compounds are found in the range of variability of the training data set. We demonstrate this tool using three different fuel properties: research octane number (RON), threshold soot index (TSI), and melting point (MP). Here we provide measures of its success with these properties using randomized train/test measurements: average accuracy is 88% in RON, 85% in TSI, and 94% in MP; average precision is 88% in RON, 88% in TSI, and 95% in MP; and average recall is 88% in RON, 82% in TSI, and 97% in MP. The receiver operator characteristics (area under the curve) were estimated at 0.88 in RON, 0.86 in TSI, and 0.87 in MP. We also measured the success of BioCompoundML by sending 16 compounds for direct RON determination. Finally, we provide a screen of 1977 hydrocarbons/oxygenates within the 8696 compounds in MetaCyc, identifying compounds with high predictive strength for high or low RON.},
doi = {10.1021/acs.energyfuels.6b01952},
journal = {Energy and Fuels},
number = 10,
volume = 30,
place = {United States},
year = {Thu Sep 15 00:00:00 EDT 2016},
month = {Thu Sep 15 00:00:00 EDT 2016}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record at 10.1021/acs.energyfuels.6b01952

Citation Metrics:
Cited by: 1work
Citation information provided by
Web of Science

Save / Share: