This function "pools" (i.e. combines) model parameters in a similar fashion
as mice::pool()
. However, this function pools parameters from
parameters_model
objects, as returned by
model_parameters()
.
Usage
pool_parameters(
x,
exponentiate = FALSE,
effects = "fixed",
component = "all",
verbose = TRUE,
...
)
Arguments
- x
A list of
parameters_model
objects, as returned bymodel_parameters()
, or a list of model-objects that is supported bymodel_parameters()
.- exponentiate
Logical, indicating whether or not to exponentiate the coefficients (and related confidence intervals). This is typical for logistic regression, or more generally speaking, for models with log or logit links. It is also recommended to use
exponentiate = TRUE
for models with log-transformed response values. For models with a log-transformed response variable, whenexponentiate = TRUE
, a one-unit increase in the predictor is associated with multiplying the outcome by that predictor's coefficient. Note: Delta-method standard errors are also computed (by multiplying the standard errors by the transformed coefficients). This is to mimic behaviour of other software packages, such as Stata, but these standard errors poorly estimate uncertainty for the transformed coefficient. The transformed confidence interval more clearly captures this uncertainty. Forcompare_parameters()
,exponentiate = "nongaussian"
will only exponentiate coefficients from non-Gaussian families.- effects
Should parameters for fixed effects (
"fixed"
), random effects ("random"
), or both ("all"
) be returned? Only applies to mixed models. May be abbreviated. If the calculation of random effects parameters takes too long, you may useeffects = "fixed"
.- component
Which type of parameters to return, such as parameters for the conditional model, the zero-inflation part of the model, the dispersion term, or other auxiliary parameters be returned? Applies to models with zero-inflation and/or dispersion formula, or if parameters such as
sigma
should be included. May be abbreviated. Note that the conditional component is also called count or mean component, depending on the model. There are three convenient shortcuts:component = "all"
returns all possible parameters. Ifcomponent = "location"
, location parameters such asconditional
,zero_inflated
, orsmooth_terms
, are returned (everything that are fixed or random effects - depending on theeffects
argument - but no auxiliary parameters). Forcomponent = "distributional"
(or"auxiliary"
), components likesigma
,dispersion
, orbeta
(and other auxiliary parameters) are returned.- verbose
Toggle warnings and messages.
- ...
Arguments passed down to
model_parameters()
, ifx
is a list of model-objects. Can be used, for instance, to specify arguments likeci
orci_method
etc.
Details
Averaging of parameters follows Rubin's rules (Rubin, 1987, p. 76). The pooled degrees of freedom is based on the Barnard-Rubin adjustment for small samples (Barnard and Rubin, 1999).
Note
Models with multiple components, (for instance, models with zero-inflation,
where predictors appear in the count and zero-inflation part, or models with
dispersion component) may fail in rare situations. In this case, compute
the pooled parameters for components separately, using the component
argument.
Some model objects do not return standard errors (e.g. objects of class
htest
). For these models, no pooled confidence intervals nor p-values
are returned.
References
Barnard, J. and Rubin, D.B. (1999). Small sample degrees of freedom with multiple imputation. Biometrika, 86, 948-955. Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: John Wiley and Sons.
Examples
# example for multiple imputed datasets
data("nhanes2", package = "mice")
imp <- mice::mice(nhanes2, printFlag = FALSE)
models <- lapply(1:5, function(i) {
lm(bmi ~ age + hyp + chl, data = mice::complete(imp, action = i))
})
pool_parameters(models)
#> # Fixed Effects
#>
#> Parameter | Coefficient | SE | 95% CI | Statistic | df | p
#> -------------------------------------------------------------------------------
#> (Intercept) | 18.85 | 3.53 | [ 10.94, 26.77] | 5.34 | 9.64 | < .001
#> age [40-59] | -5.62 | 2.00 | [-10.04, -1.21] | -2.81 | 10.73 | 0.017
#> age [60-99] | -7.05 | 2.50 | [-12.72, -1.37] | -2.82 | 8.69 | 0.021
#> hyp [yes] | 2.23 | 2.13 | [ -2.54, 6.99] | 1.05 | 9.72 | 0.321
#> chl | 0.05 | 0.02 | [ 0.01, 0.10] | 2.63 | 8.20 | 0.029
#>
#> Uncertainty intervals (equal-tailed) and p-values (two-tailed)
#> computed using a Wald distribution approximation.
# should be identical to:
m <- with(data = imp, exp = lm(bmi ~ age + hyp + chl))
summary(mice::pool(m))
#> term estimate std.error statistic df p.value
#> 1 (Intercept) 18.8529658 3.5332535 5.335866 9.637102 0.0003741459
#> 2 age40-59 -5.6214485 2.0001740 -2.810480 10.729821 0.0173287192
#> 3 age60-99 -7.0451215 2.4952100 -2.823458 8.693991 0.0206115436
#> 4 hypyes 2.2276804 2.1295899 1.046061 9.722954 0.3208412270
#> 5 chl 0.0531414 0.0201819 2.633122 8.195111 0.0294228344
# For glm, mice used residual df, while `pool_parameters()` uses `Inf`
nhanes2$hyp <- datawizard::slide(as.numeric(nhanes2$hyp))
imp <- mice::mice(nhanes2, printFlag = FALSE)
models <- lapply(1:5, function(i) {
glm(hyp ~ age + chl, family = binomial, data = mice::complete(imp, action = i))
})
m <- with(data = imp, exp = glm(hyp ~ age + chl, family = binomial))
# residual df
summary(mice::pool(m))$df
#> [1] 19.248074 19.248074 19.248074 5.431369
# df = Inf
pool_parameters(models)$df_error
#> [1] Inf Inf Inf Inf
# use residual df instead
pool_parameters(models, ci_method = "residual")$df_error
#> [1] 19.248074 19.248074 19.248074 5.431369