Developing Non-linear Machine Learning Models for Predicting PM2.5 Oxidative Potential Using Its Chemical Components Data in Various Environments
SEUNGHYE LEE, Minhan Park, Ma. Cristine Faye Denna, Dahye Oh, Jiho Jang, Kihong Park,
Gwangju Institute of Science and Technology Abstract Number: 665
Working Group: Health-Related Aerosols
AbstractOxidative potential (OP) is an effective indicator of oxidative stress induced by inhaled fine particulate matter (PM
2.5). This study aimed to develop a more generalized OP prediction model applicable to diverse environments by exploring various machine learning models. Ambient PM
2.5 samples were collected from urban (Beijing, China and Gwangju, Korea) and agricultural (Gimje, Korea) sites during both summer and winter seasons. The samples were analyzed for OP and major chemical components. The OP was measured using a dithiothreitol (DTT) assay. Significant differences in volume-normalized OP (OP
v) were observed among the sites, with the Beijing site showing the highest OP
v, followed by Gwangju and the Gimje sites. Site-specific and site-generalized feature selections identified significant and generalized influences of Mn, Cu, Zn, Pb, and water-soluble organic carbon (WSOC) on OP
v. The linear regression model, developed based on selected features, exhibited an overfitting problem as it failed to yield a generalized prediction of OP
v, especially showing poor prediction performance at the Gimje site. In contrast, tree-based ensemble models such as random forest and gradient boosting models demonstrated their effectiveness in predicting OP
v by explaining 75% of the variability of OP
v in the testing data. These models successfully captured the distinct characteristics of each site by appropriately evaluating the importance of each chemical component, capturing non-linear relationships between chemical components and OP
v, and considering interaction effects among chemical components. Furthermore, this study revealed a constrained impact of Cu on OP, which is contingent on the concentration of WSOC. This observation, derived from the utilization of global model-agnostic interpretation methods, emphasizes the significance of antagonistic effects between Cu and organic components on OP, thus highlighting their crucial role in predicting OP
v. Overall, this study provides insights into the development of generalized OP prediction models by identifying key factors influencing the OP of ambient PM
2.5.