Skip to contents

Confidence intervals around predicted values

Usage

get_predicted_ci(x, ...)

# Default S3 method
get_predicted_ci(
  x,
  predictions = NULL,
  data = NULL,
  se = NULL,
  ci = 0.95,
  ci_type = "confidence",
  ci_method = NULL,
  dispersion_method = "sd",
  vcov = NULL,
  vcov_args = NULL,
  verbose = TRUE,
  ...
)

Arguments

x

A statistical model (can also be a data.frame, in which case the second argument has to be a model).

...

Other argument to be passed, for instance to the model's predict() method, or get_predicted_ci().

predictions

A vector of predicted values (as obtained by stats::fitted(), stats::predict() or get_predicted()).

data

An optional data frame in which to look for variables with which to predict. If omitted, the data used to fit the model is used. Visualization matrices can be generated using get_datagrid().

se

Numeric vector of standard error of predicted values. If NULL, standard errors are calculated based on the variance-covariance matrix.

ci

The interval level. Default is NULL, to be fast even for larger models. Set the interval level to an explicit value, e.g. 0.95, for 95% CI).

ci_type

Can be "prediction" or "confidence". Prediction intervals show the range that likely contains the value of a new observation (in what range it would fall), whereas confidence intervals reflect the uncertainty around the estimated parameters (and gives the range of the link; for instance of the regression line in a linear regressions). Prediction intervals account for both the uncertainty in the model's parameters, plus the random variation of the individual values. Thus, prediction intervals are always wider than confidence intervals. Moreover, prediction intervals will not necessarily become narrower as the sample size increases (as they do not reflect only the quality of the fit). This applies mostly for "simple" linear models (like lm), as for other models (e.g., glm), prediction intervals are somewhat useless (for instance, for a binomial model for which the dependent variable is a vector of 1s and 0s, the prediction interval is... [0, 1]).

ci_method

The method for computing p values and confidence intervals. Possible values depend on model type.

  • NULL uses the default method, which varies based on the model type.

  • Most frequentist models: "wald" (default), "residual" or "normal".

  • Bayesian models: "quantile" (default), "hdi", "eti", and "spi".

  • Mixed effects lme4 models: "wald" (default), "residual", "normal", "satterthwaite", and "kenward-roger".

See get_df() for details.

dispersion_method

Bootstrap dispersion and Bayesian posterior summary: "sd" or "mad".

vcov

Variance-covariance matrix used to compute uncertainty estimates (e.g., for robust standard errors). This argument accepts a covariance matrix, a function which returns a covariance matrix, or a string which identifies the function to be used to compute the covariance matrix.

  • A covariance matrix

  • A function which returns a covariance matrix (e.g., stats::vcov())

  • A string which indicates the kind of uncertainty estimates to return.

    • Heteroskedasticity-consistent: "HC", "HC0", "HC1", "HC2", "HC3", "HC4", "HC4m", "HC5". See ?sandwich::vcovHC

    • Cluster-robust: "CR", "CR0", "CR1", "CR1p", "CR1S", "CR2", "CR3". See ?clubSandwich::vcovCR

    • Bootstrap: "BS", "xy", "residual", "wild", "mammen", "fractional", "jackknife", "norm", "webb". See ?sandwich::vcovBS

    • Other sandwich package functions: "HAC", "PC", "CL", "OPG", "PL".

    • Kenward-Roger approximation: kenward-roger. See ?pbkrtest::vcovAdj.

Exceptions are following models:

  • Model of class glmgee, which have pre-defined options for the variance-covariance matrix calculation. These are "robust", "df-adjusted", "model", "bias-corrected", and "jackknife". See ?glmtoolbox::vcov.glmgee for details.

  • Model of class glmmTMB currently only support the "HC0" option.

vcov_args

List of arguments to be passed to the function identified by the vcov argument. This function is typically supplied by the sandwich or clubSandwich packages. Please refer to their documentation (e.g., ?sandwich::vcovHAC) to see the list of available arguments. If no estimation type (argument type) is given, the default type for "HC" equals the default from the sandwich package; for type "CR", the default is set to "CR3".

verbose

Toggle warnings.

Details

Typically, get_predicted() returns confidence intervals based on the standard errors as returned by the predict()-function, assuming normal distribution (+/- 1.96 * SE) resp. a Student's t-distribution (if degrees of freedom are available). If predict() for a certain class does not return standard errors (for example, merMod-objects), these are calculated manually, based on following steps: matrix-multiply X by the parameter vector B to get the predictions, then extract the variance-covariance matrix V of the parameters and compute XVX' to get the variance-covariance matrix of the predictions. The square-root of the diagonal of this matrix represent the standard errors of the predictions, which are then multiplied by the critical test-statistic value (e.g., ~1.96 for normal distribution) for the confidence intervals.

If ci_type = "prediction", prediction intervals are calculated. These are wider than confidence intervals, because they also take into account the uncertainty of the model itself. Before taking the square-root of the diagonal of the variance-covariance matrix, get_predicted_ci() adds the residual variance to these values. For mixed models, get_variance_residual() is used, while get_sigma()^2 is used for non-mixed models.

It is preferred to rely on standard errors returned by get_predicted() (i.e. returned by the predict()-function), because these are more accurate than manually calculated standard errors. Use get_predicted_ci() only if standard errors are not available otherwise. An exception are Bayesian models or bootstrapped predictions, where get_predicted_ci() returns quantiles of the posterior distribution or bootstrapped samples of the predictions. These are actually accurate standard errors resp. confidence (or uncertainty) intervals.

Examples

# Confidence Intervals for Model Predictions
# ------------------------------------------

data(mtcars)

# Linear model
# ------------
x <- lm(mpg ~ cyl + hp, data = mtcars)
predictions <- predict(x)
ci_vals <- get_predicted_ci(x, predictions, ci_type = "prediction")
head(ci_vals)
#>         SE    CI_low  CI_high
#> 1 3.255505 14.558527 27.87504
#> 2 3.255505 14.558527 27.87504
#> 3 3.305931 19.309850 32.83263
#> 4 3.255505 14.558527 27.87504
#> 5 3.303717  8.687625 22.20134
#> 6 3.266957 14.630713 27.99407
ci_vals <- get_predicted_ci(x, predictions, ci_type = "confidence")
head(ci_vals)
#>          SE   CI_low  CI_high
#> 1 0.7281647 19.72752 22.70605
#> 2 0.7281647 19.72752 22.70605
#> 3 0.9279509 24.17337 27.96911
#> 4 0.7281647 19.72752 22.70605
#> 5 0.9200310 13.56281 17.32616
#> 6 0.7777664 19.72168 22.90310
ci_vals <- get_predicted_ci(x, predictions, ci = c(0.8, 0.9, 0.95))
head(ci_vals)
#>                          SE CI_low_0.8 CI_high_0.8 CI_low_0.9 CI_high_0.9
#> Mazda RX4         0.7281647   20.26184    22.17172   19.97954    22.45403
#> Mazda RX4 Wag     0.7281647   20.26184    22.17172   19.97954    22.45403
#> Datsun 710        0.9279509   24.85429    27.28818   24.49453    27.64794
#> Hornet 4 Drive    0.7281647   20.26184    22.17172   19.97954    22.45403
#> Hornet Sportabout 0.9200310   14.23793    16.65104   13.88124    17.00773
#> Valiant           0.7777664   20.29240    22.33238   19.99087    22.63391
#>                   CI_low_0.95 CI_high_0.95
#> Mazda RX4            19.72752     22.70605
#> Mazda RX4 Wag        19.72752     22.70605
#> Datsun 710           24.17337     27.96911
#> Hornet 4 Drive       19.72752     22.70605
#> Hornet Sportabout    13.56281     17.32616
#> Valiant              19.72168     22.90310

# Bootstrapped
# ------------
predictions <- get_predicted(x, iterations = 500)
get_predicted_ci(x, predictions)
#>           SE    CI_low  CI_high
#> 1  0.6939386 19.982569 22.79179
#> 2  0.6939386 19.982569 22.79179
#> 3  1.1262885 24.082900 28.18738
#> 4  0.6939386 19.982569 22.79179
#> 5  0.7230536 14.003272 16.84375
#> 6  0.7375375 20.057173 22.98248
#> 7  0.9579848 11.579126 15.31115
#> 8  1.1176291 24.636269 29.12976
#> 9  1.1331354 24.008074 28.12909
#> 10 0.6080184 19.821321 22.24563
#> 11 0.6080184 19.821321 22.24563
#> 12 0.6935495 13.956012 16.63540
#> 13 0.6935495 13.956012 16.63540
#> 14 0.6935495 13.956012 16.63540
#> 15 0.6584695 13.422141 15.95798
#> 16 0.7000002 13.062597 15.74784
#> 17 0.8095694 12.418839 15.53118
#> 18 1.1083045 24.596955 29.04103
#> 19 1.1537507 24.744974 29.23823
#> 20 1.1103521 24.610569 29.06772
#> 21 1.1406967 23.908345 28.09379
#> 22 0.9451639 14.237691 17.81125
#> 23 0.9451639 14.237691 17.81125
#> 24 0.9579848 11.579126 15.31115
#> 25 0.7230536 14.003272 16.84375
#> 26 1.1083045 24.596955 29.04103
#> 27 1.1201692 24.110721 28.24808
#> 28 1.2250590 23.373283 27.89137
#> 29 1.1786307 10.383647 15.07314
#> 30 0.8063989 18.089403 21.27880
#> 31 2.1277344  6.331641 14.63703
#> 32 1.2002230 23.559161 27.93791

ci_vals <- get_predicted_ci(x, predictions, ci = c(0.80, 0.95))
head(ci_vals)
#>          SE CI_low_0.8 CI_high_0.8 CI_low_0.95 CI_high_0.95
#> 1 0.6939386   20.46290    22.20695    19.98257     22.79179
#> 2 0.6939386   20.46290    22.20695    19.98257     22.79179
#> 3 1.1262885   24.66872    27.54226    24.08290     28.18738
#> 4 0.6939386   20.46290    22.20695    19.98257     22.79179
#> 5 0.7230536   14.50305    16.32921    14.00327     16.84375
#> 6 0.7375375   20.53431    22.37329    20.05717     22.98248
datawizard::reshape_ci(ci_vals)
#>           SE   CI    CI_low  CI_high
#> 1  0.6939386 0.80 20.462898 22.20695
#> 2  0.6939386 0.95 19.982569 22.79179
#> 3  0.6939386 0.80 20.462898 22.20695
#> 4  0.6939386 0.95 19.982569 22.79179
#> 5  1.1262885 0.80 24.668718 27.54226
#> 6  1.1262885 0.95 24.082900 28.18738
#> 7  0.6939386 0.80 20.462898 22.20695
#> 8  0.6939386 0.95 19.982569 22.79179
#> 9  0.7230536 0.80 14.503054 16.32921
#> 10 0.7230536 0.95 14.003272 16.84375
#> 11 0.7375375 0.80 20.534308 22.37329
#> 12 0.7375375 0.95 20.057173 22.98248
#> 13 0.9579848 0.80 12.599139 14.92320
#> 14 0.9579848 0.95 11.579126 15.31115
#> 15 1.1176291 0.80 25.387019 28.13629
#> 16 1.1176291 0.95 24.636269 29.12976
#> 17 1.1331354 0.80 24.625141 27.51881
#> 18 1.1331354 0.95 24.008074 28.12909
#> 19 0.6080184 0.80 20.274165 21.81579
#> 20 0.6080184 0.95 19.821321 22.24563
#> 21 0.6080184 0.80 20.274165 21.81579
#> 22 0.6080184 0.95 19.821321 22.24563
#> 23 0.6935495 0.80 14.450320 16.20939
#> 24 0.6935495 0.95 13.956012 16.63540
#> 25 0.6935495 0.80 14.450320 16.20939
#> 26 0.6935495 0.95 13.956012 16.63540
#> 27 0.6935495 0.80 14.450320 16.20939
#> 28 0.6935495 0.95 13.956012 16.63540
#> 29 0.6584695 0.80 13.908637 15.58324
#> 30 0.6584695 0.95 13.422141 15.95798
#> 31 0.7000002 0.80 13.603641 15.41224
#> 32 0.7000002 0.95 13.062597 15.74784
#> 33 0.8095694 0.80 13.092276 15.15326
#> 34 0.8095694 0.95 12.418839 15.53118
#> 35 1.1083045 0.80 25.308208 28.03605
#> 36 1.1083045 0.95 24.596955 29.04103
#> 37 1.1537507 0.80 25.574590 28.45208
#> 38 1.1537507 0.95 24.744974 29.23823
#> 39 1.1103521 0.80 25.323922 28.05982
#> 40 1.1103521 0.95 24.610569 29.06772
#> 41 1.1406967 0.80 24.552460 27.51421
#> 42 1.1406967 0.95 23.908345 28.09379
#> 43 0.9451639 0.80 14.793981 17.21559
#> 44 0.9451639 0.95 14.237691 17.81125
#> 45 0.9451639 0.80 14.793981 17.21559
#> 46 0.9451639 0.95 14.237691 17.81125
#> 47 0.9579848 0.80 12.599139 14.92320
#> 48 0.9579848 0.95 11.579126 15.31115
#> 49 0.7230536 0.80 14.503054 16.32921
#> 50 0.7230536 0.95 14.003272 16.84375
#> 51 1.1083045 0.80 25.308208 28.03605
#> 52 1.1083045 0.95 24.596955 29.04103
#> 53 1.1201692 0.80 24.712984 27.58400
#> 54 1.1201692 0.95 24.110721 28.24808
#> 55 1.2250590 0.80 24.133303 27.27370
#> 56 1.2250590 0.95 23.373283 27.89137
#> 57 1.1786307 0.80 11.963180 14.67643
#> 58 1.1786307 0.95 10.383647 15.07314
#> 59 0.8063989 0.80 18.859425 20.83788
#> 60 0.8063989 0.95 18.089403 21.27880
#> 61 2.1277344 0.80  9.183743 13.92748
#> 62 2.1277344 0.95  6.331641 14.63703
#> 63 1.2002230 0.80 24.239608 27.31460
#> 64 1.2002230 0.95 23.559161 27.93791

ci_vals <- get_predicted_ci(x,
  predictions,
  dispersion_method = "MAD",
  ci_method = "HDI"
)
head(ci_vals)
#>          SE   CI_low  CI_high
#> 1 0.6797607 19.80472 22.54090
#> 2 0.6797607 19.80472 22.54090
#> 3 1.1797681 24.07770 28.21161
#> 4 0.6797607 19.80472 22.54090
#> 5 0.7142531 13.98076 16.83358
#> 6 0.7389642 19.90145 22.81315


# Logistic model
# --------------
x <- glm(vs ~ wt, data = mtcars, family = "binomial")
predictions <- predict(x, type = "link")
ci_vals <- get_predicted_ci(x, predictions, ci_type = "prediction")
head(ci_vals)
#>                   CI_low CI_high
#> Mazda RX4           -Inf     Inf
#> Mazda RX4 Wag       -Inf     Inf
#> Datsun 710          -Inf     Inf
#> Hornet 4 Drive      -Inf     Inf
#> Hornet Sportabout   -Inf     Inf
#> Valiant             -Inf     Inf
ci_vals <- get_predicted_ci(x, predictions, ci_type = "confidence")
head(ci_vals)
#>          SE     CI_low   CI_high
#> 1 0.5623444 -0.3931282 1.8112213
#> 2 0.4690190 -0.6974034 1.1411172
#> 3 0.7195076 -0.1279982 2.6924199
#> 4 0.4459072 -1.3016913 0.4462326
#> 5 0.5021936 -1.8418839 0.1266787
#> 6 0.5094490 -1.8943152 0.1026881