DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Comparison of Machine Learning Methods to Forecast Tropospheric Ozone Levels in Delhi

Journal Article · · Atmosphere (Basel)

Ground-level ozone is a pollutant that is harmful to urban populations, particularly in developing countries where it is present in significant quantities. It greatly increases the risk of heart and lung diseases and harms agricultural crops. This study hypothesized that, as a secondary pollutant, ground-level ozone is amenable to 24 h forecasting based on measurements of weather conditions and primary pollutants such as nitrogen oxides and volatile organic compounds. We developed software to analyze hourly records of 12 air pollutants and 5 weather variables over the course of one year in Delhi, India. To determine the best predictive model, eight machine learning algorithms were tuned, trained, tested, and compared using cross-validation with hourly data for a full year. The algorithms, ranked by R2 values, were XGBoost (0.61), Random Forest (0.61), K-Nearest Neighbor Regression (0.55), Support Vector Regression (0.48), Decision Trees (0.43), AdaBoost (0.39), and linear regression (0.39). When trained by separate seasons across five years, the predictive capabilities of all models increased, with a maximum R2 of 0.75 during winter. Bidirectional Long Short-Term Memory was the least accurate model for annual training, but had some of the best predictions for seasonal training. Out of five air quality index categories, the XGBoost model was able to predict the correct category 24 h in advance 90% of the time when trained with full-year data. Separated by season, winter is considerably more predictable (97.3%), followed by post-monsoon (92.8%), monsoon (90.3%), and summer (88.9%). These results show the importance of training machine learning methods with season-specific data sets and comparing a large number of methods for specific applications.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
Grant/Contract Number:
89233218CNA000001
OSTI ID:
1853933
Report Number(s):
LA-UR-21-31571
Journal Information:
Atmosphere (Basel), Journal Name: Atmosphere (Basel) Journal Issue: 1 Vol. 13; ISSN 2073-4433
Publisher:
MDPICopyright Statement
Country of Publication:
United States
Language:
English

References (57)

Evaluating a Space-Based Indicator of Surface Ozone-NO x -VOC Sensitivity Over Midlatitude Source Regions and Application to Decadal Trends : Space-Based Indicator of O journal October 2017
Instance-based learning algorithms journal January 1991
Support-vector networks journal September 1995
Monthly runoff forecasting based on LSTM–ALO model journal May 2018
Air pollution prediction by using an artificial neural network model journal May 2019
Ground-level Ozone Prediction Using Machine Learning Techniques: A Case Study in Amman, Jordan journal May 2020
Air quality modelling using long short-term memory (LSTM) over NCT-Delhi, India journal April 2019
Protein Structure Prediction Using Rosetta book January 2004
Measurement and prediction of ozone levels around a heavily industrialized area: a neural network approach journal February 2001
Grey Wolf Optimizer journal March 2014
The Ant Lion Optimizer journal May 2015
Meta optimization of an adaptive neuro-fuzzy inference system with grey wolf optimizer and biogeography-based optimization algorithms for spatial prediction of landslide susceptibility journal April 2019
Regression-based flexible models for photochemical air pollutants in the national capital territory of megacity Delhi journal June 2021
Estimating reference evapotranspiration using hybrid adaptive fuzzy inferencing coupled with heuristic algorithms journal December 2021
Spatiotemporal distributions of surface ozone levels in China from 2005 to 2017: A machine learning approach journal September 2020
Predicting ozone levels from climatic parameters and leaf traits of Bel-W3 tobacco variety journal May 2019
Understanding the true effects of the COVID-19 lockdown on air pollution by means of machine learning journal April 2021
A review of artificial neural network models for ambient air pollution prediction journal September 2019
Modeling of daily pan evaporation in sub tropical climates using ANN, LS-SVR, Fuzzy Logic, and ANFIS journal September 2014
Satellite-based estimation of full-coverage ozone (O3) concentration and health effect assessment across Hainan Island journal January 2020
Improving streamflow prediction using a new hybrid ELM model combined with hybrid particle swarm optimization and grey wolf optimization journal October 2021
A model for particulate matter (PM2.5) prediction for Delhi based on machine learning approaches journal January 2020
Forecasting and Evaluating Water Quality of Chao Lake based on an Improved Decision Tree Method journal January 2010
A novel Encoder-Decoder model based on read-first LSTM for air pollutant prediction journal April 2021
Summarizing multiple aspects of model performance in a single diagram journal April 2001
An Evaluation of the Ocean and Sea Ice Climate of E3SM Using MPAS and Interannual CORE‐II Forcing journal May 2019
Applications of Deep Learning to Ocean Data Inference and Subgrid Parameterization journal January 2019
The DOE E3SM Coupled Model Version 1: Overview and Evaluation at Standard Resolution journal July 2019
Forcing for statistically stationary compressible isotropic turbulence journal November 2010
An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression journal August 1992
Machine learning versus linear regression modelling approach for accurate ozone concentrations prediction journal January 2020
Bridging observations, theory and numerical simulation of the ocean using machine learning journal July 2021
Using neural networks for prediction of air pollution index in industrial city journal October 2017
Outdoor Air Pollution: Ozone Health Effects journal April 2007
ANFIS: adaptive-network-based fuzzy inference system journal January 1993
A Sequence-to-Sequence Air Quality Predictor Based on the n-Step Recurrent Prediction journal January 2019
Real Time Attention Based Bidirectional Long Short-Term Memory Networks for Air Pollution Forecasting conference April 2019
Estimation of Air Pollution in Delhi Using Machine Learning Techniques conference September 2018
LSTM Network Based on on Antlion Optimization and its Application in Flight Trajectory Prediction
  • Zhang, Zhenxing; Yang, Rennong; Fang, Yuhuan
  • 2018 2nd IEEE Advanced Information Management,Communicates, Electronic and Automation Control Conference (IMCEC), 2018 2nd IEEE Advanced Information Management,Communicates,Electronic and Automation Control Conference (IMCEC) https://doi.org/10.1109/IMCEC.2018.8469476
conference May 2018
A comprehensive evaluation of air pollution prediction improvement by a machine learning method conference November 2015
A Machine Learning Model for Air Quality Prediction for Smart Cities conference March 2019
XGBoost: A Scalable Tree Boosting System conference January 2016
Air Pollution Concentration Forecast Method Based on the Deep Ensemble Neural Network journal October 2020
Development and Testing of a Decision Tree for the Forecasting of Sea Fog Along the Georgia and South Carolina Coast journal June 2018
Review on air pollution of Delhi zone using machine learning algorithm journal June 2021
Prediction of land surface temperature of major coastal cities of India using bidirectional LSTM neural networks journal September 2021
Prediction of hourly ozone concentrations with multiple regression and multilayer perceptron models journal January 2016
Air Quality Prediction in Smart Cities Using Machine Learning Technologies based on Sensor Data: A Review journal April 2020
Decomposition Methods for Machine Learning with Small, Incomplete or Noisy Datasets journal November 2020
Machine Learning-Based Prediction of Air Quality journal December 2020
Forecasting the Carbon Price Using Extreme-Point Symmetric Mode Decomposition and Extreme Learning Machine Optimized by the Grey Wolf Optimizer Algorithm journal March 2019
Air Pollution Prediction with Multi-Modal Data and Deep Neural Networks journal December 2020
Air Pollution Prediction Using Long Short-Term Memory (LSTM) and Deep Autoencoder (DAE) Models journal March 2020
Predicting River Flow Using an AI-Based Sequential Adaptive Neuro-Fuzzy Inference System journal June 2020
Application of GWO-ELM Model to Prediction of Caojiatuo Landslide Displacement in the Three Gorge Reservoir Area journal June 2020
Determination of Deep Learning Model and Optimum Length of Training Data in the River with Large Fluctuations in Flow Rates journal December 2020
Forecasting of air quality in Delhi using principal component regression technique journal October 2011