Machine Learning Models to Predict Oxidative Potential of Ambient PM2.5: A Midwestern US Case Study

TAHSINA ALAM, Hannah Horowitz, Lei Zhao, Vishal Verma, University of Illinois Urbana Champaign

     Abstract Number: 297
     Working Group: Health-Related Aerosols

Abstract
The study aims to use machine learning approach to develop predictive models for oxidative potential of ambient particulate matter (PM2.5) from its chemical composition. Oxidative potential is an intrinsic property of the particulate matter which measures the ability of the particles to consume cellular antioxidants or generate reactive organic species (ROS) which can oxidize key cellular components. This oxidative potential has been found to be more strongly associated with the health effects of PM compared to routinely measured PM mass. In this research, we are generating the training datasets by measuring OP endpoints of different chemical mixtures containing varying concentrations of inorganic and organic aerosols found in PM2.5 such as ammonium sulfate ((NH4)2SO4), ammonium nitrate (NH4NO3), humic-like substances (HULIS), hydrocarbon-like and oxygenated organic aerosols (HOA and OOA) and metals [Cu(II), Mn(II) and Fe(II)]. The test concentrations of model species were chosen based on their typical ambient concentrations and conventional filter extraction protocols. The OP is being measured using two end points: dithiothreitol assay (OPDTT) and consumption of reduced glutathione or GSH (OPGSH). A total of 972 samples will be analyzed to measure all possible mixtures. Binary mixtures will be used to quantify any synergistic or antagonistic interaction between the components. The measurements were carried out using a Semi-Automated Multi-Endpoint ROS Activity Analyzer (SAMERA). The experimental data will be modeled using six different algorithms: Multi Linear Regression, Support Vector Machine, Artificial Neural Network, Random Forest, Gaussian Process and XGBoost. The models will be validated against OP measurements of ambient PM samples collected from different locations in the midwest USA. The sample and data analysis are currently underway. Suitable model will be selected based on coefficient of determination (R2) and root mean square error (RMSE). Our final goal is to integrate the model with community earth science model (e.g. GEOS-Chem) for OP forecast across Midwestern US.