Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Human limits in machine learning: prediction of potato yield and disease using soil microbiome data

Journal Article · · BMC Bioinformatics
Abstract Background

The preservation of soil health is a critical challenge in the 21st century due to its significant impact on agriculture, human health, and biodiversity. We provide one of the first comprehensive investigations into the predictive potential of machine learning models for understanding the connections between soil and biological phenotypes. We investigate an integrative framework performing accurate machine learning-based prediction of plant performance from biological, chemical, and physical properties of the soil via two models: random forest and Bayesian neural network.

Results

Prediction improves when we add environmental features, such as soil properties and microbial density, along with microbiome data. Different preprocessing strategies show that human decisions significantly impact predictive performance. We show that the naive total sum scaling normalization that is commonly used in microbiome research is one of the optimal strategies to maximize predictive power. Also, we find that accurately defined labels are more important than normalization, taxonomic level, or model characteristics. ML performance is limited when humans can’t classify samples accurately. Lastly, we provide domain scientists via a full model selection decision tree to identify the human choices that optimize model prediction power.

Conclusions

Our study highlights the importance of incorporating diverse environmental features and careful data preprocessing in enhancing the predictive power of machine learning models for soil and biological phenotype connections. This approach can significantly contribute to advancing agricultural practices and soil health management.

Sponsoring Organization:
USDOE
Grant/Contract Number:
SC0021016
OSTI ID:
2478895
Journal Information:
BMC Bioinformatics, Journal Name: BMC Bioinformatics Journal Issue: 1 Vol. 25; ISSN 1471-2105
Publisher:
Springer Science + Business MediaCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (45)

Crop performance is predicted by soil microbial diversity across phylogenetic scales journal May 2022
Crop Yield Prediction Based on Bacterial Biomarkers and Machine Learning journal March 2024
Identifying causes of errors between two wave-related data using performance metrics journal July 2024
Positive contribution of predatory bacterial community to multiple nutrient cycling and microbial network complexity in arsenic-contaminated soils journal May 2023
Metam sodium fumigation in potato production systems has varying effects on soil health indicators journal April 2024
Visible, near infrared, mid infrared or combined diffuse reflectance spectroscopy for simultaneous assessment of various soil properties journal March 2006
Random forest in remote sensing: A review of applications and future directions journal April 2016
Prediction of geological characteristics from shield operational parameters by integrating grid search and K-fold cross validation into stacking classification algorithm journal August 2022
Full model selection using regression trees for numeric predictions of biomarkers for metabolic challenges in dairy cows journal August 2021
Streptomyces competition and co-evolution in relation to plant disease suppression journal September 2012
From diversity to complexity: Microbial networks in soils journal June 2022
The rhizosphere microbiome and plant health journal August 2012
Random Forests journal January 2001
Dealing with Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation journal April 2003
Pyrosequencing enumerates and contrasts soil microbial diversity journal July 2007
Exact sequence variants should replace operational taxonomic units in marker-gene data analysis journal July 2017
Systematic improvement of amplicon marker gene methods for increased accuracy in microbiome studies journal July 2016
Fungal-bacterial diversity and microbiome complexity predict ecosystem functioning journal October 2019
Machine learning for microbiologists journal November 2023
Quantifying biodiversity: procedures and pitfalls in the measurement and comparison of species richness journal July 2001
Algorithms for Computing the Sample Variance: Analysis and Recommendations journal August 1983
NetCoMi: network construction and comparison for microbiome data in R journal December 2020
A new family of power transformations to improve normality or symmetry journal December 2000
Shrinkage improves estimation of microbial associations under different normalization methods journal December 2020
Microbiome compositional data analysis for survival studies journal April 2024
Management and Soil Conditions Influence Common Scab Severity on Potato Tubers Via Indirect Effects on Soil Microbial Communities journal May 2020
Phylogeny of the class Actinobacteria revisited in the light of complete genomes. The orders 'Frankiales' and Micrococcales should be split into coherent entities: proposal of Frankiales ord. nov., Geodermatophilales ord. nov., Acidothermales ord. nov. and Nakamurellales ord. nov. journal August 2014
Classification in the Presence of Label Noise: A Survey journal May 2014
The rhizosphere microbiome: significance of plant beneficial, plant pathogenic, and human pathogenic microorganisms journal September 2013
ITS1: a DNA barcode better than ITS2 in eukaryotes? journal September 2014
The Statistical Analysis of Compositional Data journal January 1982
Applications of Machine Learning to the Problem of Antimicrobial Resistance: an Emerging Model for Translational Research journal June 2021
Full model selection in the space of data mining operators conference July 2012
Strategy for on-orbit space object classification using deep learning
  • Lim, Seongmin; Kim, Jin-Hyung; Kim, Hae-Dong
  • Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering, Vol. 235, Issue 15 https://doi.org/10.1177/0954410021996129
journal March 2021
Bayesian-multiplicative treatment of count zeros in compositional data sets journal September 2014
Comparison of normalization methods for the analysis of metagenomic gene abundance data journal April 2018
An improved bind-n-seq strategy to determine protein-DNA interactions validated using the bacterial transcriptional regulator YipR journal January 2020
Current knowledge and perspectives of Paenibacillus: a review journal December 2016
Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible journal April 2014
Sparse and Compositionally Robust Inference of Microbial Ecological Networks journal May 2015
DNA-SIP Reveals That Syntrophaceae Play an Important Role in Methanogenic Hexadecane Degradation journal July 2013
MiNAA: Microbiome Network Alignment Algorithm journal April 2024
Microbial Networks in SPRING - Semi-parametric Rank-Based Correlation and Partial Correlation Estimation for Quantitative Microbiome Data journal June 2019
Actinobacteria From Desert: Diversity and Biotechnological Applications journal December 2021
Changes of endophytic microbial community in Rhododendron simsii roots under heat stress and its correlation with leaf physiological indicators journal November 2022

Similar Records

Identifying microbial drivers in biological phenotypes with a Bayesian network regression model
Journal Article · Sun May 19 20:00:00 EDT 2024 · Ecology and Evolution · OSTI ID:2352490

MiNAA: Microbiome Network Alignment Algorithm
Journal Article · Sat Apr 06 20:00:00 EDT 2024 · Journal of Open Source Software · OSTI ID:2395884

Related Subjects