
Case Study: Causal inference for observational data using modelbased
Source:vignettes/practical_causality.Rmd
practical_causality.Rmd
This vignette demonstrates how to perform causal inference with
observational data using the modelbased
package. While the
examples below use the terms “treatment” and “control” groups, these
labels are arbitrary and interchangeable. Causal inference can be
applied to any scenario where observations are assigned to different
conditions, regardless of whether a “treatment” in the medical sense is
involved.
The lack of randomization
Randomized controlled trials (RCTs) are frequently employed to estimate the average treatment effect (ATE), which represents the average difference in outcomes between individuals in the treatment and control groups. In the context of RCTs, the ATE can be interpreted as a causal effect.
Estimating causal effects with observational data, where treatment and control group assignments are not randomized, presents a greater challenge due to the potential for confounding bias. In observational studies, individual characteristics associated with the outcome (e.g., age, gender, education, health status) are likely unequally distributed between the treatment and control groups. This imbalance violates a key assumption for causal effect estimation, an assumption typically satisfied by randomization in RCTs but often unmet in observational data.
Propensity scores and G-computation
Two primary methods exist for addressing the lack of randomization in observational data: propensity scores and g-computation. Regarding propensity scores, this vignette focuses on inverse probability weighting (IPW), a common technique for estimating propensity scores (Chatton and Rohrer 2024; Gabriel et al. 2024).
IPW assigns weights to individual observations to reflect their contribution to the outcome under the assumption of exchangeability between groups. When specific characteristics are over-represented in the treatment group, IPW assigns lower weights to these individuals, thereby adjusting for the imbalanced distribution of confounders between treatment and control groups.
G-computation, rooted in the traditional methods of stratification
and standardization, addresses confounding in observational studies by
partitioning the study population into strata, calculating
stratum-specific outcomes, and then weighting these outcomes to
represent a target population (e.g., the general population). In
modelbased, g-computation can be implemented by setting
estimate = "population"
. This approach directly models
counterfactual scenarios by creating copies of each observation,
effectively assigning each individual to both treatment and control
conditions. The final estimate is then derived by averaging predictions
across these counterfactual observations (Dickerman and Hernán 2020).
Both propensity score and g-computation methods offer distinct advantages and disadvantages. IPW, for instance, can be directly incorporated as weights within regression models, facilitating a straightforward interpretation of the treatment coefficient as the average treatment effect (ATE). However, deriving the ATE becomes more complex in models incorporating interaction terms or other complexities. Conversely, modelbased g-computation readily estimates the ATE even within complex model specifications. A robust approach often involves combining IPW and g-computation to achieve “doubly robust” estimation of the ATE (Chatton and Rohrer 2024; Gabriel et al. 2024). Illustrative examples will follow.
Calculating inverse probability weights
First, we create some toy data with a treatment effect that is
roughly 9 points higher than the effect in the control group. The
example data set has health related quality of life QoL
as
outcome, which was measured for 188 cancer patient at three time points
(baseline, two follow-ups). The treatment
variable is
simulated and not part of the original data set. Since we have repeated
measurements, we use the glmmTMB
package to fit a linear
mixed model to the data.
# example
library(modelbased)
library(datawizard)
library(parameters)
data(qol_cancer, package = "parameters")
# sort and group data by patient ID, then assign each patient either to
# the treatment or control condition, with higher educated patients having
# a higher chance belonging to the treatment group
set.seed(1234)
d <- qol_cancer |>
data_arrange("ID") |>
data_group("ID") |>
data_modify(treatment = rbinom(1, 1, ifelse(education == "high", 0.7, 0.4))) |>
data_ungroup()
# create a treatment effect that increased over time
# with more improvements for higher educated patients
d$QoL <- d$QoL + rnorm(
nrow(d),
(d$treatment * d$time * 5) + ifelse(d$education == "high", 5, 0),
sd = 2
)
# convert to factors
d <- to_factor(d, c("treatment", "time"))
We first look at the simple effect of treatment
, which
is about 9.3
.
# simple pooled effect
m1 <- glmmTMB::glmmTMB(
QoL ~ treatment + time + education + hospital + age + phq4 + (1 | ID),
data = d
)
# we don't need the full summary table,
# only the coefficient for treatment
model_parameters(m1, keep = "^treatment")
#> # Fixed Effects
#>
#> Parameter | Coefficient | SE | 95% CI | z | p
#> ------------------------------------------------------------------
#> treatment [1] | 9.27 | 1.96 | [5.43, 13.11] | 4.73 | < .001
Next, inverse probability weights are calculated. This involves
employing a regression model with the predictor of interest,
treatment
, as the outcome variable. This model estimates
the probability of observed confounder variables given the treatment
assignment. For binary predictors, such as the treatment
variable in this analysis, a logistic regression model is utilized.
Potential confounders serve as independent variables in this model,
generating predicted probabilities for each observation’s membership in
either the treatment or control group. Based on these probabilities, the
IPW is calculated.
# logistic regression model
m_ipw <- glm(
treatment ~ time + hospital + phq4 + education + age,
data = d,
family = binomial()
)
# add predictions, i.e. the probability of belonging to treatment
# or control for each patient in the sample
d$predictions <- predict(m_ipw, newdata = d, type = "response")
# calculating the IPW
d$ipw <- ifelse(d$treatment == 1, 1 / d$predictions, 1 / (1 - d$predictions))
Next, we run the same model again, including the weights. We see that
the treatment effect, after weighting, is about 9
points,
i.e. the treatment on average results in a 9-point higher score for
QoL.
m2 <- glmmTMB::glmmTMB(
QoL ~ treatment + time + education + hospital + age + phq4 + (1 | ID),
weights = ipw,
data = d
)
model_parameters(m2, keep = "^treatment")
#> # Fixed Effects
#>
#> Parameter | Coefficient | SE | 95% CI | z | p
#> ------------------------------------------------------------------
#> treatment [1] | 9.04 | 1.99 | [5.13, 12.95] | 4.53 | < .001
Given the simplicity of the model, g-computation offers no
significant advantage in calculating contrasts between
treatment
levels, as the results are equivalent to a
simpler approach.
estimate_contrasts(m2, "treatment", estimate = "population")
#> Counterfactual Contrasts Analysis (G-computation)
#>
#> Level1 | Level2 | Difference (CI) | p
#> ---------------------------------------------
#> 1 | 0 | 9.04 (5.13, 12.95) | <0.001
#>
#> Variable predicted: QoL
#> Predictors contrasted: treatment
#> Predictors averaged: time, education, hospital (0.95), age (0.22), phq4 (-0.076), ID
#> p-values are uncorrected.
However, we have not properly modelled the longitudinal nature of our
data. Since patients’ quality of life (QoL) was measured at three
distinct time points, allowing for an examination of treatment effects
over time, we need to include an interaction between
treatment
and time
. This revealed a
substantially greater increase in QoL over time within the treatment
group. In such more complex modeling scenarios, g-computation becomes
particularly advantageous.
# interaction terms involved
m3 <- glmmTMB::glmmTMB(
QoL ~ treatment * time + education + hospital + age + phq4 + (1 | ID),
weights = ipw,
data = d
)
# estimated marginal means, to show how treatment
# develops over time between treatment and control conditions
estimate_means(
m3,
c("time", "treatment"),
estimate = "population"
) |> plot()
We now see that the coefficient table is no longer helpful, because
the treatment effect cannot be isolated. If we want to know the ATE of
treatment
, we need to calculate contrasts for the levels of
our treatment
predictor.
# ATE no longer visible from coefficient table
model_parameters(m3, keep = "^treatment")
#> # Fixed Effects
#>
#> Parameter | Coefficient | SE | 95% CI | z | p
#> ------------------------------------------------------------------------------
#> treatment [1] | 4.11 | 2.22 | [-0.24, 8.46] | 1.85 | 0.064
#> treatment [1] × time [2] | 5.05 | 1.64 | [ 1.84, 8.26] | 3.08 | 0.002
#> treatment [1] × time [3] | 9.63 | 1.64 | [ 6.42, 12.84] | 5.88 | < .001
# contrasts of treatment levels, using g-computation
estimate_contrasts(m3, "treatment", estimate = "population")
#> Counterfactual Contrasts Analysis (G-computation)
#>
#> Level1 | Level2 | Difference (CI) | p
#> ---------------------------------------------
#> 1 | 0 | 9.00 (5.07, 12.94) | <0.001
#>
#> Variable predicted: QoL
#> Predictors contrasted: treatment
#> Predictors averaged: time, education, hospital (0.95), age (0.22), phq4 (-0.076), ID
#> p-values are uncorrected.
In even more complex scenarios, g-computation remains a valuable tool for generating accurate estimates. This example compares “simple” contrasts derived from an IPW model with contrasts generated from the same model using g-computation. The results clearly demonstrate the increased accuracy achieved by combining IPW with g-computation in such situations.
# more complex model
m4 <- glmmTMB::glmmTMB(
QoL ~ treatment * time + treatment * education + hospital + age + phq4 + (1 | ID),
weights = ipw,
data = d
)
# complex model, no g-computation
estimate_contrasts(m4, "treatment")
#> Marginal Contrasts Analysis
#>
#> Level1 | Level2 | Difference (CI) | p
#> ----------------------------------------------
#> 1 | 0 | 10.97 (6.70, 15.23) | <0.001
#>
#> Variable predicted: QoL
#> Predictors contrasted: treatment
#> Predictors averaged: time, education, hospital (0.95), age (0.22), phq4 (-0.076), ID
#> p-values are uncorrected.
# complex model, with IPW *and* G-computation (double-robust) is accurate!
estimate_contrasts(m4, "treatment", estimate = "population")
#> Counterfactual Contrasts Analysis (G-computation)
#>
#> Level1 | Level2 | Difference (CI) | p
#> ---------------------------------------------
#> 1 | 0 | 8.99 (5.10, 12.88) | <0.001
#>
#> Variable predicted: QoL
#> Predictors contrasted: treatment
#> Predictors averaged: time, education, hospital (0.95), age (0.22), phq4 (-0.076), ID
#> p-values are uncorrected.
The last example completes the “causal inference” topic by showing how to calculate the average treatment effect on the treated (ATT), and average treatment effect on the untreated (ATU).
The ATT measures how much the treatment changes the outcome for those who already received it. It can help to decide whether a program or treatment should be discontinued for the people currently benefiting from it. Likewise, the ATU estimates how much the treatment would change the outcome if it were given to those who didn’t already receive it. It helps us decide whether a program or treatment should be expanded to include those who haven’t yet benefited from it. It helps to assess the potential benefits of extending a program or treatment to a wider population.
library(insight)
# we need the data used to fit the model - this may not be
# identical to the original data due to case-wise deletion
model_data <- get_data(m4)
# the ATT - apply g-computation only to the subset of the treated
estimate_contrasts(
m4,
"treatment",
newdata = subset(model_data, treatment == 1),
estimate = "population"
)
#> Counterfactual Contrasts Analysis (G-computation)
#>
#> Level1 | Level2 | Difference (CI) | p
#> ---------------------------------------------
#> 1 | 0 | 9.79 (5.42, 14.15) | <0.001
#>
#> Variable predicted: QoL
#> Predictors contrasted: treatment
#> Predictors averaged: time, education, hospital (0.95), age (0.22), phq4 (-0.076), ID
#> p-values are uncorrected.
# the ATU - apply g-computation only to the subset of the conrol
estimate_contrasts(
m4,
"treatment",
newdata = subset(model_data, treatment == 0),
estimate = "population"
)
#> Counterfactual Contrasts Analysis (G-computation)
#>
#> Level1 | Level2 | Difference (CI) | p
#> ---------------------------------------------
#> 1 | 0 | 8.35 (4.34, 12.37) | <0.001
#>
#> Variable predicted: QoL
#> Predictors contrasted: treatment
#> Predictors averaged: time, education, hospital (0.95), age (0.22), phq4 (-0.076), ID
#> p-values are uncorrected.
Conclusion
This vignette explored the estimation of average treatment effects (ATEs) from observational data, focusing on the application of inverse probability weighting (IPW) and g-computation. While both methods address confounding bias inherent in non-randomized studies, they offer distinct strengths. IPW provides a straightforward approach for incorporating weights into regression models, simplifying ATE interpretation in basic models. However, g-computation excels in more complex scenarios, such as those involving longitudinal data and interaction terms, where isolating treatment effects becomes challenging.
Our examples demonstrated the utility of both methods. Specifically,
we showed that while IPW effectively estimates the ATE in a simple
model, g-computation becomes essential when modeling the interaction
between treatment
and time
. Furthermore,
combining IPW with g-computation, as illustrated in our last examples,
offers increased accuracy, highlighting the benefits of this “doubly
robust” approach for estimating ATEs in complex models.
Combining IPW and g-computation doesn’t introduce any additional bias and hence has no negative effects. While IPW alone can sometimes be less accurate than g-computation, particularly when g-computation is feasible, the most reliable estimates are achieved by using both methods together. Therefore, we recommend their combined use.