Developing Non-linear Machine Learning Models for Predicting PM2.5 Oxidative Potential Using Its Chemical Components Data in Various Environments

SEUNGHYE LEE, Minhan Park, Ma. Cristine Faye Denna, Dahye Oh, Jiho Jang, Kihong Park, Gwangju Institute of Science and Technology

     Abstract Number: 665
     Working Group: Health-Related Aerosols

Abstract
Oxidative potential (OP) is an effective indicator of oxidative stress induced by inhaled fine particulate matter (PM2.5). This study aimed to develop a more generalized OP prediction model applicable to diverse environments by exploring various machine learning models. Ambient PM2.5 samples were collected from urban (Beijing, China and Gwangju, Korea) and agricultural (Gimje, Korea) sites during both summer and winter seasons. The samples were analyzed for OP and major chemical components. The OP was measured using a dithiothreitol (DTT) assay. Significant differences in volume-normalized OP (OPv) were observed among the sites, with the Beijing site showing the highest OPv, followed by Gwangju and the Gimje sites. Site-specific and site-generalized feature selections identified significant and generalized influences of Mn, Cu, Zn, Pb, and water-soluble organic carbon (WSOC) on OPv. The linear regression model, developed based on selected features, exhibited an overfitting problem as it failed to yield a generalized prediction of OPv, especially showing poor prediction performance at the Gimje site. In contrast, tree-based ensemble models such as random forest and gradient boosting models demonstrated their effectiveness in predicting OPv by explaining 75% of the variability of OPv in the testing data. These models successfully captured the distinct characteristics of each site by appropriately evaluating the importance of each chemical component, capturing non-linear relationships between chemical components and OPv, and considering interaction effects among chemical components. Furthermore, this study revealed a constrained impact of Cu on OP, which is contingent on the concentration of WSOC. This observation, derived from the utilization of global model-agnostic interpretation methods, emphasizes the significance of antagonistic effects between Cu and organic components on OP, thus highlighting their crucial role in predicting OPv. Overall, this study provides insights into the development of generalized OP prediction models by identifying key factors influencing the OP of ambient PM2.5.