Machine Learning-Based Classification of Lignocellulosic Biomass from Pyrolysis-Molecular Beam Mass Spectrometry Data
High-throughput analysis of biomass is necessary to ensure consistent and uniform feedstocks for agricultural and bioenergy applications and is needed to inform genomics and systems biology models. Pyrolysis followed by mass spectrometry such as molecular beam mass spectrometry (py-MBMS) analyses are becoming increasingly popular for the rapid analysis of biomass cell wall composition and typically require the use of different data analysis tools depending on the need and application. Here, the authors report the py-MBMS analysis of several types of lignocellulosic biomass to gain an understanding of spectral patterns and variation with associated biomass composition and use machine learning approaches to classify, differentiate, and predict biomass types on the basis of py-MBMS spectra. Py-MBMS spectra were also corrected for instrumental variance using generalized linear modeling (GLM) based on the use of select ions relative abundances as spike-in controls. Machine learning classification algorithms e.g., random forest, k-nearest neighbor, decision tree, Gaussian Naïve Bayes, gradient boosting, and multilayer perceptron classifiers were used. The k-nearest neighbors (k-NN) classifier generally performed the best for classifications using raw spectral data, and the decision tree classifier performed the worst. After normalization of spectra to account for instrumental variance, all the classifiers had comparable and generally acceptable performance for predicting the biomass types, although the k-NN and decision tree classifiers were not as accurate for prediction of specific sample types. Gaussian Naïve Bayes (GNB) and extreme gradient boosting (XGB) classifiers performed better than the k-NN and the decision tree classifiers for the prediction of biomass mixtures. The data analysis workflow reported here could be applied and extended for comparison of biomass samples of varying types, species, phenotypes, and/or genotypes or subjected to different treatments, environments, etc. to further elucidate the sources of spectral variance, patterns, and to infer compositional information based on spectral analysis, particularly for analysis of data without a priori knowledge of the feedstock composition or identity.
- Research Organization:
- National Renewable Energy Laboratory (NREL), Golden, CO (United States); USDOE Bioenergy Research Centers (BRC) (United States). Center for Bioenergy Innovation (CBI)
- Sponsoring Organization:
- USDOE Office of Energy Efficiency and Renewable Energy (EERE); USDOE Office of Energy Efficiency and Renewable Energy (EERE), Transportation Office. Bioenergy Technologies Office; USDOE Office of Science (SC), Biological and Environmental Research (BER)
- Grant/Contract Number:
- AC36-08GO28308; NA
- OSTI ID:
- 1777643
- Alternate ID(s):
- OSTI ID: 1781625
- Report Number(s):
- NREL/JA-2800-79514; IJMCFK; PII: ijms22084107
- Journal Information:
- International Journal of Molecular Sciences (Online), Journal Name: International Journal of Molecular Sciences (Online) Vol. 22 Journal Issue: 8; ISSN 1422-0067
- Publisher:
- MDPI AGCopyright Statement
- Country of Publication:
- Switzerland
- Language:
- English
Similar Records
Machine Learning Classification of Molten Salt Heat Exchanger Channel Plugging using Synthetic Data
Localized keyhole pore prediction during laser powder bed fusion via multimodal process monitoring and X-ray radiography