Leveraging Satellite Measurements, Surface Monitors, and Machine Learning for Estimating 20 Years of High-Resolution Gridded PM2.5 in Ghana
ABHISHEK ANAND, Joe Adabouk Amooli, Daniel Westervelt, Columbia University
Abstract Number: 486
Working Group: Advancing Aerosol Science through Data Analysis Tools
Abstract
Exposure to air pollution is the second leading cause of death worldwide. This risk is dominated by exposure to fine particulate matter (PM2.5). The risk is more critical in low- and middle-income countries, like those in sub-Saharan Africa, due to a lack of air pollution data to build effective policies. The data scarcity is attributed to the lack of resources for research-grade equipment. Our work addresses this critical gap in understanding air pollution and its health impacts in Ghana. We are utilizing freely available retrievals of aerosol optical depth, gas-phase pollutants, atmospheric optical properties and weather parameters from the NASA and European satellites for training various machine learning (ML) models to estimate 20 years (2005 - 2024) of daily surface PM2.5 concentrations at 1km × 1km spatial resolution. The PM2.5 values are validated against measurements (2018 - 2024) from a well-calibrated network of 106 reference monitors and low-cost sensors spread across Ghana, established through years of active efforts and collaborations with local partners. We found that a fully connected neural network (RMSE ~ 6.4 µg/m3, R2~0.84) outperforms notable linear and ML models including LASSO (RMSE ~ 8.2 µg/m3, R2~0.69), and eXtreme gradient boosting (RMSE ~ 6.8µg/m3, R2~0.78) in estimating high-resolution gridded PM2.5 in Ghana. The SHapley Additive exPlanations analysis shows that wind speed and total precipitation from European ReAnalysis 5, and absorbing aerosol index, and ozone from the TROPOspheric Monitoring Instrument to be the key features contributing to model accuracy. Our study stands to be one of the first in Africa to use advanced modeling techniques to generate actionable pollution data, with potential for expansion to other countries in sub-Saharan Africa. The dataset will be available through an open-access platform for local stakeholders to guide policy decisions targeting air pollution-related health and socioeconomic inequities for underserved populations.