Visual check of model various assumptions (normality of residuals, normality of random effects, linear relationship, homogeneity of variance, multicollinearity).
Usage
check_model(x, ...)
# S3 method for default
check_model(
x,
dot_size = 2,
line_size = 0.8,
panel = TRUE,
check = "all",
alpha = 0.2,
dot_alpha = 0.8,
colors = c("#3aaf85", "#1b6ca8", "#cd201f"),
theme = "see::theme_lucid",
detrend = FALSE,
show_dots = NULL,
verbose = TRUE,
...
)
Arguments
- x
A model object.
- ...
Currently not used.
- dot_size, line_size
Size of line and dot-geoms.
- panel
Logical, if
TRUE
, plots are arranged as panels; else, single plots for each diagnostic are returned.- check
Character vector, indicating which checks for should be performed and plotted. May be one or more of
"all"
,"vif"
,"qq"
,"normality"
,"linearity"
,"ncv"
,"homogeneity"
,"outliers"
,"reqq"
,"pp_check"
,"binned_residuals"
or"overdispersion"
, Not that not all check apply to all type of models (see 'Details')."reqq"
is a QQ-plot for random effects and only available for mixed models."ncv"
is an alias for"linearity"
, and checks for non-constant variance, i.e. for heteroscedasticity, as well as the linear relationship. By default, all possible checks are performed and plotted.- alpha, dot_alpha
The alpha level of the confidence bands and dot-geoms. Scalar from 0 to 1.
- colors
Character vector with color codes (hex-format). Must be of length 3. First color is usually used for reference lines, second color for dots, and third color for outliers or extreme values.
- theme
String, indicating the name of the plot-theme. Must be in the format
"package::theme_name"
(e.g."ggplot2::theme_minimal"
).- detrend
Should QQ/PP plots be detrended?
- show_dots
Logical, if
TRUE
, will show data points in the plot. Set toFALSE
for models with many observations, if generating the plot is too time-consuming. By default,show_dots = NULL
. In this casecheck_model()
tries to guess whether performance will be poor due to a very large model and thus automatically shows or hides dots.- verbose
Toggle off warnings.
Details
For Bayesian models from packages rstanarm or brms,
models will be "converted" to their frequentist counterpart, using
bayestestR::bayesian_as_frequentist
.
A more advanced model-check for Bayesian models will be implemented at a
later stage.
Note
This function just prepares the data for plotting. To create the plots,
see needs to be installed. Furthermore, this function suppresses
all possible warnings. In case you observe suspicious plots, please refer
to the dedicated functions (like check_collinearity()
,
check_normality()
etc.) to get informative messages and warnings.
Linearity Assumption
The plot Linearity checks the assumption of linear relationship.
However, the spread of dots also indicate possible heteroscedasticity (i.e.
non-constant variance); hence, the alias "ncv"
for this plot.
Some caution is needed when interpreting these plots. Although these
plots are helpful to check model assumptions, they do not necessarily
indicate so-called "lack of fit", e.g. missed non-linear relationships or
interactions. Thus, it is always recommended to also look at
effect plots, including partial residuals.
Residuals for (Generalized) Linear Models
Plots that check the normality of residuals (QQ-plot) or the homogeneity of
variance use standardized Pearson's residuals for generalized linear models,
and standardized residuals for linear models. The plots for the normality of
residuals (with overlayed normal curve) and for the linearity assumption use
the default residuals for lm
and glm
(which are deviance
residuals for glm
).
Troubleshooting
For models with many observations, or for more complex models in general,
generating the plot might become very slow. One reason might be that the
underlying graphic engine becomes slow for plotting many data points. In
such cases, setting the argument show_dots = FALSE
might help. Furthermore,
look at the check
argument and see if some of the model checks could be
skipped, which also increases performance.
Examples
# \dontrun{
m <- lm(mpg ~ wt + cyl + gear + disp, data = mtcars)
check_model(m)
#> Variable `Component` is not in your data frame :/
if (require("lme4")) {
m <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
check_model(m, panel = FALSE)
}
if (require("rstanarm")) {
m <- stan_glm(mpg ~ wt + gear, data = mtcars, chains = 2, iter = 200)
check_model(m)
}
#> Loading required package: rstanarm
#> Loading required package: Rcpp
#> This is rstanarm version 2.21.3
#> - See https://mc-stan.org/rstanarm/articles/priors for changes to default priors!
#> - Default priors may change, so it's safest to specify priors, even if equivalent to the defaults.
#> - For execution on a local, multicore CPU with excess RAM we recommend calling
#> options(mc.cores = parallel::detectCores())
#>
#> Attaching package: ‘rstanarm’
#> The following object is masked from ‘package:psych’:
#>
#> logit
#> The following object is masked from ‘package:parameters’:
#>
#> compare_models
#>
#> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
#> Chain 1:
#> Chain 1: Gradient evaluation took 2.5e-05 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.25 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1:
#> Chain 1:
#> Chain 1: WARNING: There aren't enough warmup iterations to fit the
#> Chain 1: three stages of adaptation as currently configured.
#> Chain 1: Reducing each adaptation stage to 15%/75%/10% of
#> Chain 1: the given number of warmup iterations:
#> Chain 1: init_buffer = 15
#> Chain 1: adapt_window = 75
#> Chain 1: term_buffer = 10
#> Chain 1:
#> Chain 1: Iteration: 1 / 200 [ 0%] (Warmup)
#> Chain 1: Iteration: 20 / 200 [ 10%] (Warmup)
#> Chain 1: Iteration: 40 / 200 [ 20%] (Warmup)
#> Chain 1: Iteration: 60 / 200 [ 30%] (Warmup)
#> Chain 1: Iteration: 80 / 200 [ 40%] (Warmup)
#> Chain 1: Iteration: 100 / 200 [ 50%] (Warmup)
#> Chain 1: Iteration: 101 / 200 [ 50%] (Sampling)
#> Chain 1: Iteration: 120 / 200 [ 60%] (Sampling)
#> Chain 1: Iteration: 140 / 200 [ 70%] (Sampling)
#> Chain 1: Iteration: 160 / 200 [ 80%] (Sampling)
#> Chain 1: Iteration: 180 / 200 [ 90%] (Sampling)
#> Chain 1: Iteration: 200 / 200 [100%] (Sampling)
#> Chain 1:
#> Chain 1: Elapsed Time: 0.011783 seconds (Warm-up)
#> Chain 1: 0.012694 seconds (Sampling)
#> Chain 1: 0.024477 seconds (Total)
#> Chain 1:
#>
#> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
#> Chain 2:
#> Chain 2: Gradient evaluation took 1.3e-05 seconds
#> Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.13 seconds.
#> Chain 2: Adjust your expectations accordingly!
#> Chain 2:
#> Chain 2:
#> Chain 2: WARNING: There aren't enough warmup iterations to fit the
#> Chain 2: three stages of adaptation as currently configured.
#> Chain 2: Reducing each adaptation stage to 15%/75%/10% of
#> Chain 2: the given number of warmup iterations:
#> Chain 2: init_buffer = 15
#> Chain 2: adapt_window = 75
#> Chain 2: term_buffer = 10
#> Chain 2:
#> Chain 2: Iteration: 1 / 200 [ 0%] (Warmup)
#> Chain 2: Iteration: 20 / 200 [ 10%] (Warmup)
#> Chain 2: Iteration: 40 / 200 [ 20%] (Warmup)
#> Chain 2: Iteration: 60 / 200 [ 30%] (Warmup)
#> Chain 2: Iteration: 80 / 200 [ 40%] (Warmup)
#> Chain 2: Iteration: 100 / 200 [ 50%] (Warmup)
#> Chain 2: Iteration: 101 / 200 [ 50%] (Sampling)
#> Chain 2: Iteration: 120 / 200 [ 60%] (Sampling)
#> Chain 2: Iteration: 140 / 200 [ 70%] (Sampling)
#> Chain 2: Iteration: 160 / 200 [ 80%] (Sampling)
#> Chain 2: Iteration: 180 / 200 [ 90%] (Sampling)
#> Chain 2: Iteration: 200 / 200 [100%] (Sampling)
#> Chain 2:
#> Chain 2: Elapsed Time: 0.009371 seconds (Warm-up)
#> Chain 2: 0.010226 seconds (Sampling)
#> Chain 2: 0.019597 seconds (Total)
#> Chain 2:
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#tail-ess
#> Variable `Component` is not in your data frame :/
# }