Correcting 20,000+ Low-Cost Sensors Without a Colocation: Development of a Global Gaussian Mixture Regression Model for Optical PM2.5 Sensors

GARIMA RAHEJA, Daniel Westervelt, Columbia University

     Abstract Number: 346
     Working Group: Aerosols Spanning Spatial Scales: Measurement Networks to Models and Satellites

Abstract
Air pollution is a leading cause of global premature mortality, and is linked to more than 1 million premature deaths in Africa per year. Advancements in low-cost sensors (LCS) are helping bridge the data gapa left by historically expensive and technically challenging reference-grade monitors. While novel data science techniques are being used to develop correction factors for LCS, these studies generally (1) use co-locations with expensive reference-grade monitors, (2) utilize temperature, humidity and other measurements to account for variation in hygroscopicity and optical properties, and (3) are often local in scope, limited to one city or metro area.

Can we use correction factors developed at one location, in another? We use co-locations from 25 cities (including Palisades, NY; Accra, Ghana; Lomé, Togo; Kinshasa, DRC; Kolkata, India) at varying climatologies and distances to assess the performance of a range of machine learning techniques (including Multiple Linear Regression, Random Forest, XGBoost and Gaussian Mixture Regression), and compare them to published correction factors in the literature. Additionally, we develop a Global Gaussian Mixture Regression (GMR) machine learning model trained on co-locations from cities in the Clean Air Monitoring and Solutions Network (CAMS-Net). GMR has proven successful for correcting LCS data: in Kinshasa, the GMR-corrected Purple Air data resulted in R2 = 0.88 when compared to the MetOne BAM-1020, and in Accra, the GMR lowered the Mean Absolute Error of Clarity data from 7.51 μg/m3 to 1.93 μg/m3.

We find that in most cases, the Global Gaussian Mixture Regression model performs 57-96% as well as using a local correction model, which means it shows good promise for being useful in cases where no local colocation is available / feasible. In some cases, such as Nairobi, the Global Gaussian Mixture Regression model is actually 1.2x better than using a local model.

The wide breadth of the Global GMR allows for correction of LCS data without the need for a local co-location. We present an open-source dashboard that enables the correction of data from 20,000+ PurpleAir and Clarity sensors around the world without a reference monitor colocation.