Optimization of Liquid Level in a Condensation-based Bioaerosol Sampler for Efficient Sampling using Machine Learning

Xiaohan Li, MOHAMMAD WASHEEM, Amin Shirkhani, Matthew D. Jansen, William B. Vass, Xing He, Bian Jiang, Z. Hugh Fan, Chang-Yu Wu, University of Miami

     Abstract Number: 656
     Working Group: Instrumentation and Methods

Abstract
The Viable Virus Aerosol Sampler (VIVAS) enlarges airborne viruses via water vapor condensation, facilitating their efficient collection in a viability-conserving liquid medium. However, reliable prediction of its performance across diverse environmental conditions remains challenging due to its highly tunable parameters. Field observations revealed that collection dish liquid levels, which significantly impact conservation of virus viability and VIVAS efficiency, fluctuate with ambient conditions and sampler settings. To address this, we propose an extendable machine learning pipeline, designed to identify optimal conditions for a specific liquid level, and reliably predict whether a given set of experimental parameters will yield acceptable outcomes. This pipeline, iteratively improvable with additional data, addresses this gap by stratifying 148 data points, into training and testing subsets with an 80/20 ratio. We utilized Python’s Lazy Predict to identify optimal classification threshold for liquid levels to enhance clustering effects and select the most suitable classification algorithm. We further refined our model through hyperparameter tuning with five-fold cross-validation. As one of the top-performing ML algorithms, XGBoost achieved over 76% Macro F1 Score and identified the top three most important features for liquid levels by gain to be (1) moderator temperature, (2) particle concentration, and (3) temperature difference between moderator and nozzle. The model demonstrates reasonable performance despite the complexity of such experiments and limited data availability due to their time and effort-intensive characteristics. Also, as illustrated, the pipeline retains the flexibility to prioritize alternative metrics, thereby accommodating the specific requirements of diverse scenarios. For future work, we aim to further improve the prediction through collection of additional data and feature engineering, which is the process of transforming data into more effective forms of inputs. Additionally, the selection of primary evaluation metrics will be tailored to effectively balance the cost and return across various experimental designs.