Employing Machine Learning for New Particle Formation Identification and Mechanistic Analysis: Insights from the Six-Year Observation at the Southern Great Plains

WEIXING HAO, Fan Mei, Tirthankar Chakraborty, Yang Wang, University of Miami

     Abstract Number: 105
     Working Group: Remote and Regional Atmospheric Aerosol

Abstract
New Particle Formation (NPF) is a key factor influencing air quality and climate. Recognizing the important role of NPF in atmospheric processes, this research addresses the gap in automated identification and mechanistic analysis of such events. This study uses machine learning (ML) techniques to identify key atmospheric variables that influence NPF events, including relative humidity (RH), temperature, aerosol surface area, wind direction, wind speed, total organics, sulfur dioxide, and other gas precursors. We aim to elucidate the relationship between these environmental variables and NPF events by using atmospheric and meteorological data. We utilized a comprehensive dataset collected for six years by the Atmospheric Radiation Measurement (ARM) in the Southern Great Plains (SGP) of Oklahoma, USA, and analyzed it with a focus on different seasons and years.

We identified the NPF occurrence using data collected from surface measurements and developed random forest algorithms to determine the relevance of atmospheric variables on the occurrence of NPF events within this data-driven framework. This approach yielded predictions of these events with an accuracy ranging from 90 to 95%. Our results indicated that temperature, RH, and first boundary layer height are the main factors that are associated with the occurrence of NPF, with the normalized variable importance values of 0.32, 0.19, and 0.18. We found that overall, there are higher frequencies of NPF occurrence in spring (35.45%) and winter (42.14%) seasons at the SGP, while the NPF frequency is lower in the summer (4.01%). This study advances the predictive modeling of NPF events for future airborne missions and highlights the effectiveness of ML for studying atmospheric aerosol processes.