|
A marriage of machine learning methods and causal inference parameters to derive counterfactual dose-response curves for air pollution and respiratory outcomes
EKATERINA ELISEEVA (1), Alan Hubbard (1), Kathleen Hammond (1), Ira Tager (1)
(1) University of California, Berkeley School of Public Health
Abstract Number: 102
Last modified: November 2, 2009
Preference: Poster Presentation
Working Group: sq2
Abstract
Background: Sometimes for ease of analysis of associations between air pollutants and health, air pollution concentrations are converted into discrete categories to estimate the risk of health outcomes for different levels of exposure. This is particularly true when relatively new methods of analysis, collectively referred to as “causal inference methods”, are used. These methods allow for marginal (population-level) estimates not always available from more conventional statistical methods. Due to ease of understanding and implementation, inverse weighted estimating equation approaches are used most frequently. Such approaches practically require use of discrete exposure variables. This is a major limitation of this method, since it is of obvious interest to assess the health impacts of air pollution exposures by examining the exposure-response relations between health outcomes and exposures measured on a pseudo-continuous scale. Ideally, this should be done as nonparametrically as possible to avoid unsupportable modeling assumptions that can lead to misspecification of the exposure-response of interest.
Methods: Fortunately, there exist methods, that utilize the strength of machine learning, thereby providing bias reduction in the estimate of causal parameters, ,e.g., marginal structural models (MSM; Robins, 2000). In this paper, we use G-computation methods utilizing extremely flexible machine learning to estimate the so-called counterfactual exposure-response curve. The idea is to estimate, over a fine grid of the exposure, a, theta(a) is equivalent to EW{E(Y|A=a,W)}, where Y is the outcome, A is the exposure of interest, and W are the potential confounders; under assumptions, E(Y$_a)=EW{E(Y|A=a,W)}, where Y$_a is the counterfactual representing the outcome for an individual if, possibly counter to fact, s/he were exposed to A=a (i.e., a specific level of air pollution). If the model for Q(A,W) is equivalent to E(Y|A,W) is estimated data-adaptively and very flexibly, then one should derive asymptotically an unbiased estimate of theta(a) and, thus, the true exposure -response. If furthermore W contains all the confounders, and there is a positive probability of observing each of the possible exposures, A, for every covariate group defined by W; then, we also have an unbiased estimate of the model, m(a), of E(Y$_a), or the counterfactual mean as a (semi) continuous function of the level of exposure, a. Finally, if the empirical form of m(a) can be described by a simple model (e.g., linear in a), then we also derive information for the parameterization of an MSM. Thus, this approach is a powerful method to derive the unconfounded exposure-response relationship.
Results: We illustrate the technique with data from the FACES study to estimate the probability of wheezing as a function of NO2, measured at different lags and moving averages– we derive point-wise inference using the bootstrap and discuss methods for curve-wise inference using re-sampling based multiple testing methods (quantile-transformation method).
Conclusions: As better (more flexible) machine learning tools are developed and as larger studies are available, these methods will lead towards efficient estimates of unbiased counterfactual dose-response curves for air pollution as well as other environmental exposures.
|