Abstract View
Predicting Glass Transition Temperature and Viscosity of Organic Molecules via Machine Learning and Molecular Embeddings
TOMMASO GALEAZZO, Manabu Shiraiwa, University of California, Irvine
Abstract Number: 149
Working Group: Aerosol Standards
Abstract
Modeling secondary organic aerosols (SOA) rely on accurate representation of physical properties of semi-volatile organic species composing atmospheric particles. Notably, partitioning of SOA species between the gas and particle phases is highly influenced by particle phase state and viscosity. SOA viscosity can be estimated from the glass transition temperature (Tg) of the constituting compounds, which can be predicted from the elemental composition of individual organic molecules. For accurate predictions, information on molecular structure and functional groups would need to be considered for modeling of complex SOA mixtures.
Here, we introduce a new Tg prediction method powered by a machine learning (ML) model and by “molecular embeddings”, recently developed high-dimensional descriptors of chemical species. Molecular embeddings are developed from word2vec and Morgan algorithm representations of chemical species extracted from SMILES (i.e. mol2vec). We have trained state-of-the-art ML models on a large database of experimental Tg data of pure organic species and their corresponding molecular embeddings. Different algorithms have been explored for accuracy in predicting Tg. The final Tg prediction method is built on top of an Extreme Gradient Boosting (XGB) model and it largely outperforms previous Tg parametrizations. The new ML powered model has a mean absolute error of 19.0 K and a R2 of 0.97, it accounts for atom connectivity within molecules and it can predict different Tg for compositional isomers. The new ML model can also reproduce experimental viscosity data and quantify the influence of number and location of functional groups within a molecule on pure compounds viscosity. This new ML powered Tg model can be exploited to predict viscosity in numerical models involving organic species, with future applications that go beyond aerosol chemistry and extend to modeling of organic molecules.