The purpose of check_dag()
is to build, check and visualize
your model based on directed acyclic graphs (DAG). The function checks if a
model is correctly adjusted for identifying specific relationships of
variables, especially directed (maybe also "causal") effects for given
exposures on an outcome. In case of incorrect adjustments, the function
suggests the minimal required variables that should be adjusted for (sometimes
also called "controlled for"), i.e. variables that at least need to be
included in the model. Depending on the goal of the analysis, it is still
possible to add more variables to the model than just the minimally required
adjustment sets.
check_dag()
is a convenient wrapper around ggdag::dagify()
,
dagitty::adjustmentSets()
and dagitty::adjustedNodes()
to check correct
adjustment sets. It returns a dagitty object that can be visualized with
plot()
. as.dag()
is a small convenient function to return the
dagitty-string, which can be used for the online-tool from the
dagitty-website.
Usage
check_dag(
...,
outcome = NULL,
exposure = NULL,
adjusted = NULL,
latent = NULL,
effect = c("all", "total", "direct"),
coords = NULL
)
as.dag(x, ...)
Arguments
- ...
One or more formulas, which are converted into dagitty syntax. First element may also be model object. If a model objects is provided, its formula is used as first formula, and all independent variables will be used for the
adjusted
argument. See 'Details' and 'Examples'.- outcome
Name of the dependent variable (outcome), as character string or as formula. Must be a valid name from the formulas provided in
...
. If not set, the first dependent variable from the formulas is used.- exposure
Name of the exposure variable (as character string or formula), for which the direct and total causal effect on the
outcome
should be checked. Must be a valid name from the formulas provided in...
. If not set, the first independent variable from the formulas is used.- adjusted
A character vector or formula with names of variables that are adjusted for in the model, e.g.
adjusted = c("x1", "x2")
oradjusted = ~ x1 + x2
. If a model object is provided in...
, any values inadjusted
will be overwritten by the model's independent variables.- latent
A character vector with names of latent variables in the model.
- effect
Character string, indicating which effect to check. Can be
"all"
(default),"total"
, or"direct"
.- coords
Coordinates of the variables when plotting the DAG. The coordinates can be provided in three different ways:
a list with two elements,
x
andy
, which both are named vectors of numerics. The names correspond to the variable names in the DAG, and the values forx
andy
indicate the x/y coordinates in the plot.a list with elements that correspond to the variables in the DAG. Each element is a numeric vector of length two with x- and y-coordinate.
a data frame with three columns:
x
,y
andname
(which contains the variable names).
See 'Examples'.
- x
An object of class
check_dag
, as returned bycheck_dag()
.
Value
An object of class check_dag
, which can be visualized with plot()
.
The returned object also inherits from class dagitty
and thus can be used
with all functions from the ggdag and dagitty packages.
Specifying the DAG formulas
The formulas have following syntax:
One-directed paths: On the left-hand-side is the name of the variables where causal effects point to (direction of the arrows, in dagitty-language). On the right-hand-side are all variables where causal effects are assumed to come from. For example, the formula
Y ~ X1 + X2
, paths directed from bothX1
andX2
toY
are assumed.Bi-directed paths: Use
~~
to indicate bi-directed paths. For example,Y ~~ X
indicates that the path betweenY
andX
is bi-directed, and the arrow points in both directions. Bi-directed paths often indicate unmeasured cause, or unmeasured confounding, of the two involved variables.
Minimally required adjustments
The function checks if the model is correctly adjusted for identifying the direct and total effects of the exposure on the outcome. If the model is correctly specified, no adjustment is needed to estimate the direct effect. If the model is not correctly specified, the function suggests the minimally required variables that should be adjusted for. The function distinguishes between direct and total effects, and checks if the model is correctly adjusted for both. If the model is cyclic, the function stops and suggests to remove cycles from the model.
Note that it sometimes could be necessary to try out different combinations
of suggested adjustments, because check_dag()
can not always detect whether
at least one of several variables is required, or whether adjustments should
be done for all listed variables. It can be useful to copy the dagitty-code
(using as.dag()
, which prints the dagitty-string into the console) into
the dagitty-website and play around with different adjustments.
Direct and total effects
The direct effect of an exposure on an outcome is the effect that is not mediated by any other variable in the model. The total effect is the sum of the direct and indirect effects. The function checks if the model is correctly adjusted for identifying the direct and total effects of the exposure on the outcome.
Why are DAGs important - the Table 2 fallacy
Correctly thinking about and identifying the relationships between variables is important when it comes to reporting coefficients from regression models that mutually adjust for "confounders" or include covariates. Different coefficients might have different interpretations, depending on their relationship to other variables in the model. Sometimes, a regression coefficient represents the direct effect of an exposure on an outcome, but sometimes it must be interpreted as total effect, due to the involvement of mediating effects. This problem is also called "Table 2 fallacy" (Westreich and Greenland 2013). DAG helps visualizing and thereby focusing the relationships of variables in a regression model to detect missing adjustments or over-adjustment.
References
Rohrer, J. M. (2018). Thinking clearly about correlations and causation: Graphical causal models for observational data. Advances in Methods and Practices in Psychological Science, 1(1), 27–42. doi:10.1177/2515245917745629
Westreich, D., & Greenland, S. (2013). The Table 2 Fallacy: Presenting and Interpreting Confounder and Modifier Coefficients. American Journal of Epidemiology, 177(4), 292–298. doi:10.1093/aje/kws412
Examples
# no adjustment needed
check_dag(
y ~ x + b,
outcome = "y",
exposure = "x"
)
#> # Check for correct adjustment sets
#> - Outcome: y
#> - Exposure: x
#>
#> Identification of direct and total effects
#>
#> Model is correctly specified.
#> No adjustment needed to estimate the direct and total effect of `x` on `y`.
#>
# incorrect adjustment
dag <- check_dag(
y ~ x + b + c,
x ~ b,
outcome = "y",
exposure = "x"
)
dag
#> # Check for correct adjustment sets
#> - Outcome: y
#> - Exposure: x
#>
#> Identification of direct and total effects
#>
#> Incorrectly adjusted!
#> To estimate the direct and total effect, at least adjust for `b`. Currently, the model does not adjust for any variables.
#>
plot(dag)
# After adjusting for `b`, the model is correctly specified
dag <- check_dag(
y ~ x + b + c,
x ~ b,
outcome = "y",
exposure = "x",
adjusted = "b"
)
dag
#> # Check for correct adjustment sets
#> - Outcome: y
#> - Exposure: x
#> - Adjustment: b
#>
#> Identification of direct and total effects
#>
#> Model is correctly specified.
#> All minimal sufficient adjustments to estimate the direct and total effect were done.
#>
# using formula interface for arguments "outcome", "exposure" and "adjusted"
check_dag(
y ~ x + b + c,
x ~ b,
outcome = ~y,
exposure = ~x,
adjusted = ~ b + c
)
#> # Check for correct adjustment sets
#> - Outcome: y
#> - Exposure: x
#> - Adjustments: b and c
#>
#> Identification of direct and total effects
#>
#> Model is correctly specified.
#> All minimal sufficient adjustments to estimate the direct and total effect were done.
#>
# if not provided, "outcome" is taken from first formula, same for "exposure"
# thus, we can simplify the above expression to
check_dag(
y ~ x + b + c,
x ~ b,
adjusted = ~ b + c
)
#> # Check for correct adjustment sets
#> - Outcome: y
#> - Exposure: x
#> - Adjustments: b and c
#>
#> Identification of direct and total effects
#>
#> Model is correctly specified.
#> All minimal sufficient adjustments to estimate the direct and total effect were done.
#>
# use specific layout for the DAG
dag <- check_dag(
score ~ exp + b + c,
exp ~ b,
outcome = "score",
exposure = "exp",
coords = list(
# x-coordinates for all nodes
x = c(score = 5, exp = 4, b = 3, c = 3),
# y-coordinates for all nodes
y = c(score = 3, exp = 3, b = 2, c = 4)
)
)
plot(dag)
# alternative way of providing the coordinates
dag <- check_dag(
score ~ exp + b + c,
exp ~ b,
outcome = "score",
exposure = "exp",
coords = list(
# x/y coordinates for each node
score = c(5, 3),
exp = c(4, 3),
b = c(3, 2),
c = c(3, 4)
)
)
plot(dag)
# Objects returned by `check_dag()` can be used with "ggdag" or "dagitty"
ggdag::ggdag_status(dag)
# Using a model object to extract information about outcome,
# exposure and adjusted variables
data(mtcars)
m <- lm(mpg ~ wt + gear + disp + cyl, data = mtcars)
dag <- check_dag(
m,
wt ~ disp + cyl,
wt ~ am
)
dag
#> # Check for correct adjustment sets
#> - Outcome: mpg
#> - Exposure: wt
#> - Adjustments: cyl, disp and gear
#>
#> Identification of direct and total effects
#>
#> Model is correctly specified.
#> All minimal sufficient adjustments to estimate the direct and total effect were done.
#>
plot(dag)