10th International Aerosol Conference
September 2 - September 7, 2018
America's Center Convention Complex
St. Louis, Missouri, USA

Abstract View


How Uncertainties in Measurements and Choice of Regression Method Affect Inference from Data

SANTTU MIKKONEN, Mikko Pitkänen, Tuomo Nieminen, Antti Lipponen, Antti Arola, Kari Lehtinen, University of Eastern Finland

     Abstract Number: 849
     Working Group: Aerosol Physics

Abstract
When analysing dependencies of two or more measured variables, regression models are usually applied. Regression model can be linear or nonlinear, depending on the data. Standard regression models assume that the independent variables of the models have been measured without error and the models account only for errors in the dependent variables or responses. In cases where the measurements of the predictors contain error, the estimation with the standard methods, usually Ordinary Least Squares (OLS), do not tend to the true parameter values, not even asymptotically. In linear models, the coefficients are underestimated but in nonlinear models, the bias is likely to be more complicated.

Measurement errors in each variable of the model must be taken into account in the regression analysis by applying some errors-in-variables (EIV) regression. In this study, we compared OLS regression results to six different regression methods (Deming regression, Principal component analysis regression, orthogonal regression, Bayesian EIV regression and two different bivariate regression methods) known to be able to take account errors in variables and provide (at least asymptotically) unbiased estimates.

We simulated a dataset mimicking new particle formation rate (J3) and sulphuric acid concentration (H2SO4) reported from CLOUD-chamber measurements in CERN. Both variables are known to have substantial measurement error and thus they are good test variables for our study. Additionally, the relationship of logarithms of these variables is quite often described with linear OLS regression and thus the inference may be flawed.

The data were simulated using model log10(J3)=β*log10(H2SO4)+α with the “true slope” β=3.3 and with measurement error comparable to bias in true measurements. OLS regression gave slope of 1.7 and the EIV models gave slopes in range 2.48-4.05, depending on the procedure, how they account for the measurement error. The biases and sensitivity of different regression methods were compared with simulations with different error rates and different sample sizes. All methods had high uncertainty with samples smaller than 10 observations but with larger samples, all methods converged rapidly to their peculiar level. When the variance of H2SO4 was increased, slope for OLS decreased drastically, whereas with the EIV models the decrease of the slope was moderate or negligible.