A Deep Learning-based PM2.5 Forecasting Model for Pittsburgh Using GEOS-CF Based Atmospheric Composition Data

ABHISHEK ANAND, Albert Presto, Amir Barati Farimani, Carnegie Mellon University

     Abstract Number: 463
     Working Group: Aerosol Exposure

Abstract
Exposure to PM2.5 causes around 4.3 million premature deaths globally. Pollution exposure is even higher during the high pollution episodes caused by boundary layer inversions. Decline in vertical mixing during these inversions reduces air pollutant dispersion and consequently traps pollutants close to the surface leading to higher pollution which further promotes formation of secondary pollutants. Therefore, it becomes important to locally predict these high pollution episodes. We build different PM2.5 forecast models for Pittsburgh (Allegheny County), including multi-linear regression, machine learning, and deep learning models, and compare their performances to find an optimum forecast model. The models use (i) 5 years of hourly pollutant data (PM2.5, CO, CO2, NO2) from low-cost sensor network of 50 sites in Allegheny County, (ii) land use parameters (static covariates), (iii) PM2.5 and CO from regulatory monitors, (iv) dynamic covariates (temperature, wind and precipitation) and (v) 5-day forecast for O3, CO, NO2, SO2 and PM2.5 from Goddard Earth Observing System Composition Forecasting (GEOS-CF) at 25 km × 25 km grid resolution. We observed that the best performing ML model (Gradient boosting) shows a better performance with R2 of ~0.74 and root mean square error (RMSE) of ~4.2 µg.m-3 over the multi-linear regression model (R2 ~0.52; RMSE ~5.3 µg.m-3). Similarly, the deep learning model (Convolution neural network - CNN) exhibits an improved performance (R2: 0.81; RMSE: 2.95 µg.m-3) over the ML model. Different feature reduction techniques were used for the three types of models to identify key features for the forecasting models. We observed improvements in each forecast model’s performance by adding GEOS-CF based forecast for pollutants as predictor variables. The deep learning-based PM2.5 forecast model can help predict high pollution episodes from boundary layer inversions which can benefit policymakers in responding to these episodes to reduce health exposure.