10th International Aerosol Conference
September 2 - September 7, 2018
America's Center Convention Complex
St. Louis, Missouri, USA

Abstract View


Multivariate Statistical Analysis Methods as a Tool to Study Complex Mass Spectrometry Data Sets

SINI ISOKÄÄNTÄ, Eetu Kari, Angela Buchholz, Annele Virtanen, Santtu Mikkonen, University of Eastern Finland

     Abstract Number: 667
     Working Group: Aerosol Chemistry

Abstract
Mass spectrometer measurements produce complex data with a large number of variables. In this work, we used different statistical dimension reduction techniques to compress the information from the data to a small number of factors, which can be further interpreted. Different variations of Positive Matrix Factorization (PMF), Exploratory Factor Analysis (EFA) and Principal Component Analysis (PCA) were applied to multivariate car exhaust emission data measured by proton-transfer-reaction time-of-flight mass spectrometry (PTR-ToF-MS). In the experiments, the diluted gasoline car exhaust was fed into an environmental chamber, and it was further photo-oxidized to clarify the possible reactions of gasoline car exhaust with OH-radicals in the atmosphere.

This work showed that different statistical methods produced similar results, but EFA created factors with the most plausible physical interpretation for the experiments. EFA separated PTR-ToF-MS measured car exhaust data into four different factors: Feeding factor, secondary organic aerosol (SOA) precursor factor, reaction side product factor, and reaction product factor.

The benefit of EFA and PCA compared to PMF is that these methods scale the variables and only the relative changes in concentrations are taken into account. Generally, this allows these methods to discover very small changes in the time series of the variables even if the concentrations are low. The advantage of PMF that PMF calculates the results in data units whereas EFA/PMF results have arbitrary units. In our work, however, we calculated the EFA/PCA factors also in the data units by multiplying the original data with the loading values (i.e. contribution of a variable to a factor) acquired from EFA/PCA. This allowed us to compare the factor time series from different methods reliably. This work demonstrated that all statistical methods tested offered valuable tools for complex data set analysis. Particularly, EFA was useful in this specific case because it identified the most interpretable factors from the experiments.