Abstract View

Ensemble-Based Deep Learning for Estimating PM2.5 over California with Multi-Source Big Data Including Wildfire Smoke

Lianfa Li, Mariam Girguis, Frederick Lurmann, Nathan Pavlovic, Crystal McClure, Meredith Franklin, Jun Wu, Luke Oman, Carrie Breton, Frank Gilliland, RIMA HABRE, University of Southern California

Abstract Number: 136
Working Group: Satellite-Data and Environmental Health Applications

Abstract
Introduction: Estimating PM_2.5 concentrations and their prediction uncertainties at a high spatiotemporal resolution is important for air pollution health effect studies. This is particularly challenging for California, with highly variable natural and anthropogenic emissions, meteorology, topography and land use.

Methods: Using ensemble-based deep learning with big data fused from multiple sources we developed a PM_2.5 prediction model with uncertainty estimates at a high spatial (1km x 1km) and temporal (weekly) resolution for a 10-year time span (2008-2017). Data sources included remote sensing data (MAIAC aerosol optical depth (AOD), normalized difference vegetation index, impervious surface), MERRA-2 GMI Replay Simulation (M2GMI) output, wildfire smoke plume dispersion estimates (HYSPLIT and MODIS fire radiative power), meteorology, land cover, traffic, elevation, and spatiotemporal trends (geo-coordinates, temporal basis functions, time index). Missing MAIAC AOD observations were imputed and adjusted for relative humidity and vertical distribution.
Results: Ensemble deep learning to predict PM_2.5 achieved an overall mean training RMSE of 1.54 μg/m3 (R²: 0.94) and test RMSE of 2.29 μg/m3 (R²: 0.87). The top predictors included M2GMI carbon monoxide mixing ratio in the bottom layer, temporal basis functions, spatial location, air temperature, MAIAC AOD, and PM_2.5 sea salt mass concentration. In an independent test using three long-term AQS sites and one short-term non-AQS site, our model achieved a high correlation (>0.8) and a low RMSE (<3 μg/m3). Statewide predictions indicated that our model can capture the spatial distribution and temporal peaks in wildfire-related PM_2.5. The coefficient of variation indicated highest uncertainty over deciduous and mixed forests and open water land covers.

Conclusion: Our method can be generalized to other similarly complex regions. Prediction uncertainty estimates can also inform further model development and measurement error evaluations in exposure and health studies.