Using Colocations from 25+ Cities to Create a Global Correction for Optical Low-Cost PM2.5 Sensors
GARIMA RAHEJA, Daniel Westervelt, Columbia University
Abstract Number: 434
Working Group: Urban Aerosols
Abstract
Air pollution is a leading cause of global premature mortality. Traditional methods of measuring air pollution are expensive, technically challenging, and inaccessible for many low-income marginalized communities. Advancements in low-cost sensors (LCS) are helping bridge the data gap left by these reference-grade monitors. Novel data science techniques are being used to develop correction factors for LCS, but these studies generally 1. use co-locations with expensive reference-grade monitors 2. utilize temperature, humidity and other measurements to account for variation in hygroscopicity and optical properties and 3. are often local in scope, limited to one city or metro area.
Can we use correction factors developed in one community, in another? We use colocations contributed by 25 community projects and regulatory studies (including in NYC, Ohio, Accra, Lomé, Kinshasa, London, and Kolkata) at varying climatologies to assess the performance of 4 machine learning techniques, and compare them to correction factors in the literature. Additionally, we develop a Global Gaussian Mixture Regression (GMR) machine learning model trained on co-locations from communities in the Clean Air Monitoring and Solutions Network (CAMS-Net). GMR has proven successful for correcting LCS data: in Kinshasa, the GMR-corrected Purple Air data resulted in R2 = 0.88 when compared to the MetOne BAM1020, and in Accra, the GMR lowered Mean Absolute Error of Clarity data from 7.51 ?g/m3 to 1.93 ?g/m3.
We find that in most cases, the Global Gaussian Mixture Regression model performs 57-96% as well as using a local correction model, which means that using this model could provide high levels of accuracy in a community using LCS without the need for a $100,000+ reference monitor colocation.In some cases, such as Nairobi, the Global Gaussian Mixture Regression model is actually 1.2x better than using correction developed with local colocation.
Global GMR is greater than the sum of its parts: contribution from some communities has reciprocated progress in many more. We present an open-source dashboard that enables the correction of data from 20,000+ PurpleAir and Clarity sensors around the world without a reference monitor colocation, and has allowed community groups, regulators and policymakers around the world to make the most of their LCS data.