AAAR 37th Annual Conference October 14 - October 18, 2019 Oregon Convention Center Portland, Oregon, USA
Abstract View
Spatial Variation of Air Pollutants Using Machine Learning Models
JIAJUN GU, Gaurav Bang, Abhijeet Guha Roy, Michael Brauer, Benjamin Barratt, Martha Lee, K. Max Zhang, Cornell University
Abstract Number: 483 Working Group: Urban Aerosols
Abstract Urban air pollution is characterized by significant spatial heterogeneity as a result of multiple emission sources, complex urban morphology and dynamic meteorology. In this study, we compared four different machine learning models, i.e., land-use (linear) regression (LUR), random forest (RF), Gaussian process (GP), and artificial neural networks (ANN) to predict the spatial variations of nitrogen dioxide (NO2), nitrogen oxide (NO), PM2.5, and black carbon (BC) concentrations in Hong Kong. LUR uses a linear predictor function and is easy to interpret, but is difficult to capture the non-linear relationships and can lead to variance inflation due to multicollinearity. RF constructs a number of decision trees and outputs the mean predictions of the individual trees, which can capture non-linear relationships, but has the drawback of model interpretability. GP takes a stochastic view with the key idea of imposing a multivariate Gaussian prior distribution over the functions reflecting the input-output relations. It takes advantage of the nonparametric flexibility and is able to quantify the uncertainty, with the cost of computation time. ANN models the interdependence among all the predictors and is able to capture complex relationships. Its main drawback is model interpretability due to its black-box nature. All the models were evaluated using hold-out evaluation, with different percentages of sites held out. In terms of prediction accuracy, ANN outperformed RF and GP, and LUR ranked the last. But ANN’s black-box nature also made it the least interpretable. We argue that GP is advantageous when prediction accuracy and model interpretation are equally important. Also, GP’s nonparametric flexibility and capability to quantify uncertainties hold great promise in a variety of air pollution applications.