Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Machine Learning in Environmental Research: Common Pitfalls and Best Practices

Journal Article · · Environmental Science and Technology
 [1];  [2];  [2]
  1. Department of Civil and Environmental Engineering and Andlinger Center for Energy and the Environment, Princeton University, Princeton, New Jersey 08544, United States; OSTI
  2. Department of Civil and Environmental Engineering and Andlinger Center for Energy and the Environment, Princeton University, Princeton, New Jersey 08544, United States

Not provided.

Research Organization:
Princeton Univ., NJ (United States)
Sponsoring Organization:
USDOE Office of Energy Efficiency and Renewable Energy (EERE)
DOE Contract Number:
EE0009269
OSTI ID:
2418770
Journal Information:
Environmental Science and Technology, Journal Name: Environmental Science and Technology Journal Issue: 46 Vol. 57; ISSN 0013-936X
Publisher:
American Chemical Society (ACS)
Country of Publication:
United States
Language:
English

References (97)

Randomness in neural networks: an overview journal February 2017
The Elements of Statistical Learning book January 2009
An Introduction to Statistical Learning book January 2013
Interpretable Machine Learning – A Brief History, State-of-the-Art and Challenges book January 2020
39 Dimensionality and sample size considerations in pattern recognition practice book January 1982
Biological treatment of a dye solution by Macroalgae Chara sp.: Effect of operational parameters, intermediates identification and artificial neural network modeling journal April 2010
Hydrothermal conversion of urban food waste to chars for removal of textile dyes from contaminated waters journal June 2014
Machine learning prediction of biochar yield and carbon contents in biochar based on biomass characteristics and pyrolysis conditions journal September 2019
Thomas and artificial neural network models for the fixed-bed adsorption of methylene blue by a beach waste Posidonia oceanica (L.) dead leaves journal July 2011
Review of soft sensor methods for regression applications journal March 2016
Spatiotemporal continuous estimates of PM2.5 concentrations in China, 2000–2016: A machine learning method with inputs from satellites, chemical transport model, and ground observations journal February 2019
A comparison of linear regression, regularization, and machine learning algorithms to develop Europe-wide spatial models of fine particles and nitrogen dioxide journal September 2019
Rapid identification of soil cadmium pollution risk at regional scale based on visible and near-infrared spectroscopy journal November 2015
Air pollution characteristics and their relation to meteorological conditions during 2014–2015 in major Chinese cities journal April 2017
Daily air quality index forecasting with hybrid models: A case in China journal December 2017
Spatiotemporal prediction of daily ambient ozone levels across China using random forest for human exposure assessment journal February 2018
Spatiotemporal patterns of PM10 concentrations over China during 2005–2016: A satellite-based estimation using the random forests approach journal November 2018
Predicting monthly high-resolution PM2.5 concentrations with random forest model in the North China Plain journal November 2018
Integration of artificial neural network and MADA methods for green supplier selection journal August 2010
The optimized artificial neural network model with Levenberg–Marquardt algorithm for global solar radiation estimation in Eastern Mediterranean Region of Turkey journal March 2016
Application of soft computing methods for predicting the elastic modulus of recycled aggregate concrete journal March 2018
A strength prediction model using artificial intelligence for recycling waste tailings as cemented paste backfill journal May 2018
Predictive modelling for solar thermal energy systems: A comparison of support vector regression, random forest, extra trees and regression trees journal December 2018
Hybrid wind energy forecasting and analysis system based on divide and conquer scheme: A case study in China journal June 2019
State-of-charge estimation of lithium-ion battery using an improved neural network model and extended Kalman filter journal October 2019
Short-term wind power forecasting based on support vector machine with improved dragonfly algorithm journal January 2020
A hybrid algorithm for carbon dioxide emissions forecasting based on improved lion swarm optimizer journal January 2020
Identification of high impact factors of air quality on a national scale using big data and machine learning techniques journal January 2020
The use of artificial neural network (ANN) for modeling of COD removal from antibiotic aqueous solution by the Fenton process journal July 2010
Artificial neural network modeling in competitive adsorption of phenol and resorcinol from water environment using some carbonaceous adsorbents journal April 2011
Toxicity of ionic liquids: Database and prediction via quantitative structure–activity relationship method journal August 2014
Modeling and optimization of biogas production from a waste digester using artificial neural network and genetic algorithm journal April 2010
Wavelet and ANN combination model for prediction of daily suspended sediment load in rivers journal July 2011
Intercomparison of air quality data using principal component analysis, and forecasting of PM10 and PM2.5 concentrations using artificial neural networks, in Thessaloniki and Helsinki journal March 2011
Evaluation of a multiple regression model for the forecasting of the concentrations of NOx and PM10 in Athens and Helsinki journal March 2011
Linear and nonlinear modeling approaches for urban air quality prediction journal June 2012
Estimating future burned areas under changing climate in the EU-Mediterranean countries journal April 2013
Predictive modeling of groundwater nitrate pollution using Random Forest and multisource variables related to intrinsic and specific vulnerability: A case study in an agricultural setting (Southern Spain) journal April 2014
Development of early-warning protocol for predicting chlorophyll-a concentration using machine learning models in freshwater and estuarine reservoirs, Korea journal January 2015
Wetland loss due to land use change in the Lower Paraná River Delta, Argentina journal October 2016
A novel hybrid model for air quality index forecasting based on two-phase decomposition technique and modified extreme learning machine journal February 2017
Forecasting of groundwater level fluctuations using ensemble hybrid multi-wavelet neural network-based models journal December 2017
Meteorological factors had more impact on airborne bacterial communities than air pollutants journal December 2017
A hybrid machine learning model to predict and visualize nitrate concentration throughout the Central Valley aquifer, California, USA journal December 2017
Mapping flood susceptibility in mountainous areas on a national scale in China journal February 2018
PM10 concentration forecasting in the metropolitan area of Oviedo (Northern Spain) using models based on SVM, MLP, VARMA and ARIMA: A case study journal April 2018
Feature selection approaches for predictive modelling of groundwater nitrate pollution: An evaluation of filters, embedded and wrapper methods journal May 2018
Landslide susceptibility modelling using GIS-based machine learning techniques for Chongren County, Jiangxi Province, China journal June 2018
High resolution mapping of soil organic carbon stocks using remote sensing variables in the semi-arid rangelands of eastern Australia journal July 2018
Development of a stacked ensemble model for forecasting and analyzing daily average PM2.5 concentrations in Beijing, China journal September 2018
A machine learning method to estimate PM2.5 concentrations across China with remote sensing, meteorological and land use information journal September 2018
Assessment of advanced random forest and decision tree algorithms for modeling rainfall-induced landslide susceptibility in the Izu-Oshima Volcanic Island, Japan journal April 2019
A hybrid model for spatiotemporal forecasting of PM2.5 based on graph convolutional neural network and long short-term memory journal May 2019
Combined life cycle assessment and artificial intelligence for prediction of output energy and environmental impacts of sugarcane production journal May 2019
Flash flood susceptibility modeling using an optimized fuzzy rule based feature selection technique and tree based ensemble methods journal June 2019
Gully erosion susceptibility assessment and management of hazard-prone areas in India using different machine learning algorithms journal June 2019
Flash-flood hazard assessment using ensembles and Bayesian-based machine learning models: Application of the simulated annealing feature selection method journal April 2020
Predicting permeability changes with injecting CO2 in coal seams during CO2 geological sequestration: A comparative study among six SVM-based hybrid models journal February 2020
Associations among pathogenic bacteria, parasites, and environmental and land use factors in multiple mixed-use watersheds journal November 2011
Prediction of contamination potential of groundwater arsenic in Cambodia, Laos, and Thailand using artificial neural network journal November 2011
Longevity and effectiveness of aluminum addition to reduce sediment phosphorus release and restore lake water quality journal June 2016
Chemical structure-based predictive model for the oxidation of trace organic contaminants by sulfate radical journal June 2017
Adaptive soft sensing of river flow prediction for wastewater treatment operation and risk management journal July 2022
Automated machine learning-based prediction of microplastics induced impacts on methane production in anaerobic digestion journal September 2022
ES&T in the 21st Century: A Data-Driven Analysis of Research Topics, Interconnections, And Trends in the Past 20 Years journal March 2021
Data Analytics for Environmental Science and Engineering Research journal August 2021
Machine Learning: New Ideas and Tools in Environmental Science and Engineering journal August 2021
Predicting Heavy Metal Adsorption on Soil with Machine Learning and Mapping Global Distribution of Soil Adsorption Capacities journal October 2021
Predicting Micropollutant Removal by Reverse Osmosis and Nanofiltration Membranes: Is Machine Learning Viable? journal August 2021
Revolutionizing Membrane Design Using Machine Learning-Bayesian Optimization journal December 2021
Data-Driven Machine Learning in Environmental Pollution: Gains and Problems journal January 2022
Predicting Extraction Selectivity of Acetic Acid in Pervaporation by Machine Learning Models with Data Leakage Management journal March 2023
ChatGPT and Environmental Research journal March 2023
Assessing PM 2.5 Exposures with High Spatiotemporal Resolution across the Continental United States journal April 2016
Estimating PM 2.5 Concentrations in the Conterminous United States Using the Random Forest Approach journal June 2017
Predicting Daily Urban Fine Particulate Matter Concentrations Using a Random Forest Model journal March 2018
Satellite-Based Estimates of Daily NO2 Exposure in China Using Hybrid Random Forest and Spatiotemporal Kriging Model journal March 2018
Prediction Modeling and Mapping of Groundwater Fluoride Contamination throughout India journal July 2018
An Ensemble Machine-Learning Model To Predict Historical PM 2.5 Concentrations in China from Satellite Data journal October 2018
Satellite-Derived 1-km-Resolution PM 1 Concentrations from 2014 to 2018 across China journal October 2019
Revealing Drivers of Haze Pollution by Explainable Machine Learning journal January 2022
Deep Learning Optimization for Soft Sensing of Hard-to-Measure Wastewater Key Variables journal June 2022
Event Detection in Water Distribution Systems from Multivariate Water Quality Time Series journal July 2012
A Hybrid Approach to Estimating National Scale Spatiotemporal Variability of PM2.5in the Contiguous United States journal June 2013
Spatiotemporal Prediction of Fine Particulate Matter During the 2008 Northern California Wildfires Using Machine Learning journal February 2015
Missing data: Our view of the state of the art. journal January 2002
Drug discovery with explainable artificial intelligence journal October 2020
Small sample size effects in statistical pattern recognition: recommendations for practitioners journal March 1991
Cross-Validation for Imbalanced Datasets: Avoiding Overoptimistic and Overfitting Approaches [Research Frontier] journal November 2018
A general and simple method for obtaining R 2 from generalized linear mixed-effects models journal December 2012
Leakage in data mining: formulation, detection, and avoidance
  • Kaufman, Shachar; Rosset, Saharon; Perlich, Claudia
  • Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11 https://doi.org/10.1145/2020408.2020496
conference January 2011
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
  • Ribeiro, Marco Tulio; Singh, Sameer; Guestrin, Carlos
  • Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '16 https://doi.org/10.1145/2939672.2939778
conference January 2016
MEIGO: an open-source software suite based on metaheuristics for global optimization in systems biology and bioinformatics journal January 2014
Building Predictive Models in R Using the caret Package journal January 2008
PySwarms: a research toolkit for Particle Swarm Optimization in Python journal January 2018
A survey on pre-processing techniques: Relevant issues in the context of environmental data mining journal December 2016
EvoloPy: An Open-source Nature-inspired Optimization Framework in Python conference January 2016

Similar Records

Machine Learning for Materials Scientists: An Introductory Guide toward Best Practices
Journal Article · 2020 · Chemistry of Materials · OSTI ID:1766496

Common principles and best practices for engineering microbiomes
Journal Article · 2019 · Nature Reviews Microbiology · OSTI ID:1579360