Using Machine Learning Models to Enhance Chemical Characterization from TOF-ACSM Mass Spectra

NA MAO, Manjula Canagaratna, Nga Lee Ng, Satoshi Takahama, Ann M. Dillner, University of California, Davis

     Abstract Number: 39
     Working Group: Instrumentation and Methods

Abstract
Characterizing organic aerosols is critical in understanding atmospheric processes that affect climate change, air quality, and human health. Time of Flight Aerosol Chemical Speciation Monitor (ToF-ACSM) has been utilized to quantify inorganic and organic aerosols with high time resolution for a decade. However, the combined effects of thermal decomposition and fragmentation from electron ionization interfere with organic (OA) composition characterization. Estimating functional groups, like carboxylic acids and alcohols, from ToF-ACSM mass spectra remains challenging. The ToF-ACSM organic characterization will be enhanced by qualitatively exploring the relationship between ACSM fragment ions and fractions of functional groups (acids and alcohols) present in various representative compounds and assessing what type of model might best be used for prediction. Here, we systematically explore the relationships between fragment ions (m/z 29, 43, 44, 55, 57, 60, 69, 71, 73) from individual compounds, lab-generated mixtures, and their functional groups (acid mass fraction and OH mass fraction) by statistical models (Partial Least Square Regression, PLSR) and machine learning techniques (Random forest, RF, and Support Vector Machine, SVM). We selected a wide range of chemical representatives of carboxylic acids, alcohols, and multi-functional compounds with different O:C and sources, for instance, hydrocarbon-like OA, biomass-burning OA, cooking OA, and oxidized OA, including 22 pure compounds and 20 lab-generated mixtures. The fragment ions from lab-generated mixtures are used as the test set to evaluate predictions of acid or OH mass fractions made by mathematical mixtures from single compounds in three models. Overall, the OH mass fraction is better predicted than the acid mass fraction, and the RF model shows better prediction metrics (R2, bias, errors, etc.) than other models.