Optimizing PurpleAir PM2.5 Data in a South Asian Megacity: One-Year Machine Learning Calibration and Correction Factor Development in Dhaka, Bangladesh

JAMES NIMO, Md. Aynul Bari, Lauren Riviere, Kaylia Morgan, Abhijeet Sarkar, Chowdhury Arafat Hossain, Sanjeev Delwar, Abdus Salam, Provat Saha, University at Albany, State University of New York

     Abstract Number: 285
     Working Group: Instrumentation and Methods

Abstract
Low-cost PurpleAir PM2.5 sensors offer a scalable solution for air quality monitoring in South Asian megacities like Dhaka, Bangladesh, but necessitate robust local calibration. This study presents a year-long field evaluation of four PurpleAir sensors, focusing on optimizing data accuracy using machine learning (ML) models. Under National Science Foundation (NSF) International Research Experiences for Students (IRES) program, PurpleAir monitors were co-located alongside a reference-grade instrument at a continuous air quality monitoring station (CAMS) in Dhaka hosted by the Department of Environment. PurpleAir's native estimated PM2.5 data algorithms (ATM and CF-1) were assessed, and various ML calibration models (including Multiple Linear Regression, XGBoost, Random Forest, and LightGBM) were developed for both data streams. The study also evaluated long-term sensor drift among the four PurpleAir units to develop practical guidance for effective re-calibration strategies within the South Asian context. The extended monitoring period facilitated a thorough investigation of seasonal performance variations and the impact of relative humidity (RH), analyzed across three bins (20-40%, 40-60%, and 60-100%). The performance of these locally tuned ML models was compared against established external calibration equations. Results indicate that while raw PurpleAir outputs show considerable error, particularly under high RH and specific seasonal conditions, the developed ML models significantly improve data accuracy. Notably, the Dhaka-specific models demonstrated better performance over widely used external calibration approaches, highlighting the necessity of localized strategies that account for unique regional atmospheric characteristics and seasonal dynamics. This research provides critical insights and practical methodologies for optimizing low-cost sensor deployment and data quality in humid, resource-constrained environments, ultimately supporting more accurate air quality assessment and informed decision-making in South Asian countries.