Robust Estimation of Standard Errors, Confidence Intervals, and p-values
Source:vignettes/model_parameters_robust.Rmd
model_parameters_robust.Rmd
The model_parameters()
function also allows the computation of standard errors, confidence
intervals, and p-values based on various covariance matrices:
heteroskedasticity-consistent, cluster-robust, bootstrap, etc. This
functionality relies on the sandwich
and clubSandwich
packages. This means that all models supported by either of these
packages should work with model_parameters()
.
The computation of robust standard errors is controlled by two arguments:
-
vcov
: accepts 3 types of arguments- A covariance matrix
- A function which returns a covariance matrix (e.g.,
stats::vcov()
) - A string which indicates the kind of uncertainty estimates to
return.
- Heteroskedasticity-consistent:
"vcovHC"
,"HC"
,"HC0"
,"HC1"
,"HC2"
,"HC3"
,"HC4"
,"HC4m"
,"HC5"
- Cluster-robust:
"vcovCR"
,"CR0"
,"CR1"
,"CR1p"
,"CR1S"
,"CR2"
,"CR3"
- Bootstrap:
"vcovBS"
,"xy"
,"residual"
,"wild"
,"mammen"
,"webb"
- Other
sandwich
functions:"vcovHAC"
,"vcovPC"
,"vcovCL"
,"vcovPL"
- Heteroskedasticity-consistent:
-
vcov_args
: list of arguments passed to the sandwich or clubSandwich function used to compute the covariance matrix. See for example?sandwich::vcovHAC
Linear Regression Models
Robust Covariance Matrix Estimation from Model Parameters
Let us start with a simple example, which uses a
heteroskedasticity-consistent covariance matrix estimation with
estimation-type “HC3” (i.e.
sandwich::vcovHC(type = "HC3")
).
First let’s create a simple linear regression model, which we know violates homoscedasticity assumption, and thus robust estimation methods are to be considered.
data(cars)
model <- lm(dist ~ speed, data = cars)
library(performance)
check_heteroscedasticity(model)
#> Warning: Heteroscedasticity (non-constant error variance) detected (p = 0.031).
We would extract model parameters both with and without robust estimation to highlight the difference it makes to standard errors, confidence intervals, t-statistic, and p-values. Also, note that the coefficient estimate remains unchanged.
# model parameters, where SE, CI, and p-values are *not* based on robust estimation
model_parameters(model)
#> Parameter | Coefficient | SE | 95% CI | t(48) | p
#> -------------------------------------------------------------------
#> (Intercept) | -17.58 | 6.76 | [-31.17, -3.99] | -2.60 | 0.012
#> speed | 3.93 | 0.42 | [ 3.10, 4.77] | 9.46 | < .001
# model parameters, where SE, CI, and p-values are based on robust estimation
mp <- model_parameters(model, vcov = "HC3")
mp
#> Parameter | Coefficient | SE | 95% CI | t(48) | p
#> -------------------------------------------------------------------
#> (Intercept) | -17.58 | 5.93 | [-29.51, -5.65] | -2.96 | 0.005
#> speed | 3.93 | 0.43 | [ 3.07, 4.79] | 9.20 | < .001
# compare standard errors to result from sandwich-package
mp$SE
#> [1] 5.93 0.43
unname(sqrt(diag(sandwich::vcovHC(model))))
#> [1] 5.93 0.43
Cluster-Robust Covariance Matrix Estimation (sandwich)
If a different type of covariance matrix estimation is required, use
the vcov
-argument. This argument accepts the name of a
function from the sandwich or clubSandwich packages as
a string, such as "vcovCL"
(or just its suffix
"CL"
). parameters will then call the corresponding
function with the content of vcov_args
as arguments.
The specific estimation type can be controlled by passing a
type
argument via vcov_args
. See
?sandwich::vcovCL
for information about the different types
of covariance matrices that this function can produce (HC0
to HC3
). In the next example, we use a clustered covariance
matrix estimation with HC1
-estimation type.
# let's create a more complicated model
data(iris)
model <- lm(Petal.Length ~ Sepal.Length * Species + Sepal.Width, data = iris)
# change estimation-type
mp <- model_parameters(
model,
vcov = "CL", # type of covariance matrix
vcov_args = list(type = "HC1") # type of robust estimation
)
mp
#> Parameter | Coefficient | SE | 95% CI | t(143) | p
#> ------------------------------------------------------------------------------------------
#> (Intercept) | 0.87 | 0.42 | [ 0.03, 1.70] | 2.05 | 0.042
#> Sepal Length | 0.04 | 0.11 | [-0.18, 0.26] | 0.40 | 0.692
#> Species [versicolor] | -0.78 | 0.65 | [-2.07, 0.51] | -1.19 | 0.237
#> Species [virginica] | -0.41 | 0.59 | [-1.57, 0.75] | -0.70 | 0.483
#> Sepal Width | 0.11 | 0.08 | [-0.05, 0.27] | 1.38 | 0.170
#> Sepal Length × Species [versicolor] | 0.61 | 0.12 | [ 0.37, 0.85] | 4.96 | < .001
#> Sepal Length × Species [virginica] | 0.68 | 0.11 | [ 0.46, 0.90] | 6.15 | < .001
# compare standard errors to result from sandwich-package
mp$SE
#> [1] 0.422 0.111 0.653 0.587 0.079 0.123 0.111
unname(sqrt(diag(sandwich::vcovCL(model))))
#> [1] 0.422 0.111 0.653 0.587 0.079 0.123 0.111
Usually, clustered covariance matrix estimation is used when there is
a cluster-structure in the data. The variable indicating the
cluster-structure can be defined in sandwich::vcovCL()
with
the cluster
-argument. In model_parameters()
,
additional arguments that should be passed down to functions from the
sandwich package can be specified in
vcov_args
:
iris$cluster <- factor(rep(LETTERS[1:8], length.out = nrow(iris)))
# change estimation-type, defining additional arguments
mp <- model_parameters(
model,
vcov = "vcovCL",
vcov_args = list(type = "HC1", cluster = iris$cluster)
)
mp
#> Parameter | Coefficient | SE | 95% CI | t(143) | p
#> ------------------------------------------------------------------------------------------
#> (Intercept) | 0.87 | 0.34 | [ 0.20, 1.53] | 2.57 | 0.011
#> Sepal Length | 0.04 | 0.07 | [-0.10, 0.19] | 0.61 | 0.540
#> Species [versicolor] | -0.78 | 0.52 | [-1.80, 0.25] | -1.49 | 0.137
#> Species [virginica] | -0.41 | 0.26 | [-0.94, 0.11] | -1.56 | 0.120
#> Sepal Width | 0.11 | 0.07 | [-0.03, 0.25] | 1.52 | 0.131
#> Sepal Length × Species [versicolor] | 0.61 | 0.10 | [ 0.42, 0.80] | 6.29 | < .001
#> Sepal Length × Species [virginica] | 0.68 | 0.05 | [ 0.58, 0.78] | 13.28 | < .001
# compare standard errors to result from sandwich-package
mp$SE
#> [1] 0.337 0.072 0.519 0.264 0.072 0.097 0.051
unname(sqrt(diag(sandwich::vcovCL(model, cluster = iris$cluster))))
#> [1] 0.337 0.072 0.519 0.264 0.072 0.097 0.051
Cluster-Robust Covariance Matrix Estimation (clubSandwich)
using clubSandwich::vcovCR()
. Thus, when
vcov = "CR"
, the related function from the
clubSandwich
package is called. Note that this function
requires the specification of the
cluster
-argument.
# create fake-cluster-variable, to demonstrate cluster robust standard errors
iris$cluster <- factor(rep(LETTERS[1:8], length.out = nrow(iris)))
# cluster-robust estimation
mp <- model_parameters(
model,
vcov = "vcovCR",
vcov_args = list(type = "CR1", cluster = iris$cluster)
)
mp
#> Parameter | Coefficient | SE | 95% CI | t(143) | p
#> ------------------------------------------------------------------------------------------
#> (Intercept) | 0.87 | 0.33 | [ 0.21, 1.52] | 2.62 | 0.010
#> Sepal Length | 0.04 | 0.07 | [-0.10, 0.18] | 0.63 | 0.531
#> Species [versicolor] | -0.78 | 0.51 | [-1.78, 0.23] | -1.53 | 0.129
#> Species [virginica] | -0.41 | 0.26 | [-0.92, 0.10] | -1.60 | 0.112
#> Sepal Width | 0.11 | 0.07 | [-0.03, 0.25] | 1.55 | 0.123
#> Sepal Length × Species [versicolor] | 0.61 | 0.09 | [ 0.42, 0.79] | 6.42 | < .001
#> Sepal Length × Species [virginica] | 0.68 | 0.05 | [ 0.58, 0.78] | 13.56 | < .001
# compare standard errors to result from clubSsandwich-package
mp$SE
#> [1] 0.330 0.070 0.508 0.259 0.071 0.095 0.050
unname(sqrt(diag(clubSandwich::vcovCR(model, type = "CR1", cluster = iris$cluster))))
#> [1] 0.330 0.070 0.508 0.259 0.071 0.095 0.050
Robust Covariance Matrix Estimation on Standardized Model Parameters
Finally, robust estimation can be combined with standardization.
However, robust covariance matrix estimation only works for
standardize = "refit"
.
# model parameters, robust estimation on standardized model
model_parameters(model, standardize = "refit", vcov = "HC3")
#> Parameter | Coefficient | SE | 95% CI | t(143) | p
#> -------------------------------------------------------------------------------------------
#> (Intercept) | -1.30 | 0.07 | [-1.44, -1.16] | -18.70 | < .001
#> Sepal Length | 0.02 | 0.06 | [-0.09, 0.13] | 0.37 | 0.711
#> Species [versicolor] | 1.57 | 0.09 | [ 1.40, 1.74] | 17.84 | < .001
#> Species [virginica] | 2.02 | 0.09 | [ 1.84, 2.20] | 22.49 | < .001
#> Sepal Width | 0.03 | 0.02 | [-0.01, 0.07] | 1.32 | 0.190
#> Sepal Length × Species [versicolor] | 0.28 | 0.06 | [ 0.16, 0.41] | 4.65 | < .001
#> Sepal Length × Species [virginica] | 0.32 | 0.06 | [ 0.21, 0.43] | 5.75 | < .001
Linear Mixed-Effects Regression Models
Robust Covariance Matrix Estimation for Mixed Models
For linear mixed-effects models, that by definition have a clustered (hierarchical or multilevel) structure in the data, it is also possible to estimate a cluster-robust covariance matrix. This is possible due to the clubSandwich package, thus we need to define the same arguments as in the above example.
library(lme4)
data(iris)
set.seed(1234)
iris$grp <- as.factor(sample(1:3, nrow(iris), replace = TRUE))
# fit example model
model <- lme4::lmer(
Sepal.Length ~ Species * Sepal.Width + Petal.Length + (1 | grp),
data = iris
)
# model parameters without robust estimation
model_parameters(model)
#> # Fixed Effects
#>
#> Parameter | Coefficient | SE | 95% CI | t(141) | p
#> ------------------------------------------------------------------------------------------
#> (Intercept) | 1.55 | 0.40 | [ 0.76, 2.35] | 3.87 | < .001
#> Species [versicolor] | 0.41 | 0.55 | [-0.67, 1.50] | 0.75 | 0.454
#> Species [virginica] | -0.41 | 0.58 | [-1.56, 0.74] | -0.70 | 0.483
#> Sepal Width | 0.66 | 0.11 | [ 0.44, 0.89] | 5.83 | < .001
#> Petal Length | 0.82 | 0.07 | [ 0.69, 0.95] | 12.52 | < .001
#> Species [versicolor] × Sepal Width | -0.48 | 0.19 | [-0.85, -0.12] | -2.60 | 0.010
#> Species [virginica] × Sepal Width | -0.36 | 0.18 | [-0.71, 0.00] | -1.99 | 0.048
#>
#> # Random Effects
#>
#> Parameter | Coefficient | SE | 95% CI
#> -------------------------------------------------------
#> SD (Intercept: grp) | 0.08 | 0.05 | [0.02, 0.29]
#> SD (Residual) | 0.30 | 0.02 | [0.27, 0.33]
# model parameters with cluster robust estimation
model_parameters(
model,
vcov = "vcovCR",
vcov_args = list(type = "CR1", cluster = iris$grp)
)
#> # Fixed Effects
#>
#> Parameter | Coefficient | SE | 95% CI | t(141) | p
#> ------------------------------------------------------------------------------------------
#> (Intercept) | 1.55 | 0.40 | [ 0.76, 2.35] | 3.87 | < .001
#> Species [versicolor] | 0.41 | 0.80 | [-1.17, 1.99] | 0.51 | 0.608
#> Species [virginica] | -0.41 | 0.19 | [-0.78, -0.03] | -2.15 | 0.033
#> Sepal Width | 0.66 | 0.10 | [ 0.46, 0.86] | 6.64 | < .001
#> Petal Length | 0.82 | 0.05 | [ 0.72, 0.91] | 17.27 | < .001
#> Species [versicolor] × Sepal Width | -0.48 | 0.35 | [-1.18, 0.21] | -1.37 | 0.172
#> Species [virginica] × Sepal Width | -0.36 | 0.11 | [-0.57, -0.15] | -3.39 | < .001
#>
#> # Random Effects
#>
#> Parameter | Coefficient | SE | 95% CI
#> -------------------------------------------------------
#> SD (Intercept: grp) | 0.08 | 0.05 | [0.02, 0.29]
#> SD (Residual) | 0.30 | 0.02 | [0.27, 0.33]
Notice that robust estimation returns different standard errors, confidence intervals, test statistic and p-values compared to the standard estimation. Also, note that the coefficient estimate remains unchanged.
Robust Covariance Matrix Estimation on Standardized Mixed Model Parameters
Once again, robust estimation can be combined with standardization
for linear mixed-effects models as well and works only with
standardize = "refit"
.
# model parameters, cluster robust estimation of standardized mixed model
model_parameters(
model,
standardize = "refit",
vcov = "vcovCR",
vcov_args = list(type = "CR1", cluster = iris$grp)
)
#> # Fixed Effects
#>
#> Parameter | Coefficient | SE | 95% CI | t(141) | p
#> ------------------------------------------------------------------------------------------
#> (Intercept) | 0.97 | 0.08 | [ 0.82, 1.12] | 12.55 | < .001
#> Species [versicolor] | -1.29 | 0.33 | [-1.95, -0.63] | -3.85 | < .001
#> Species [virginica] | -1.81 | 0.23 | [-2.26, -1.37] | -8.01 | < .001
#> Sepal Width | 0.35 | 0.05 | [ 0.24, 0.45] | 6.64 | < .001
#> Petal Length | 1.74 | 0.10 | [ 1.54, 1.94] | 17.27 | < .001
#> Species [versicolor] × Sepal Width | -0.25 | 0.19 | [-0.62, 0.11] | -1.37 | 0.172
#> Species [virginica] × Sepal Width | -0.19 | 0.06 | [-0.30, -0.08] | -3.39 | < .001
#>
#> # Random Effects
#>
#> Parameter | Coefficient | SE | 95% CI
#> -------------------------------------------------------
#> SD (Intercept: grp) | 0.10 | 0.06 | [0.03, 0.35]
#> SD (Residual) | 0.36 | 0.02 | [0.32, 0.40]
Notice how drastically some of the p-values change between robust-unstandardized model and robust-standardized model.