compare_performance() computes indices of model performance for different models at once and hence allows comparison of indices across models.

compare_performance(..., metrics = "all", rank = FALSE, verbose = TRUE)

Arguments

...

Multiple model objects (also of different classes).

metrics

Can be "all", "common" or a character vector of metrics to be computed. See related documentation() of object's class for details.

rank

Logical, if TRUE, models are ranked according to 'best' overall model performance. See 'Details'.

verbose

Toggle off warnings.

Value

A data frame (with one row per model) and one column per "index" (see metrics).

Details

Model Weights

When information criteria (IC) are requested in metrics (i.e., any of "all", "common", "AIC", "AICc", "BIC", "WAIC", or "LOOIC"), model weights based on these criteria are also computed. For all IC except LOOIC, weights are computed as w = exp(-0.5 * delta_ic) / sum(exp(-0.5 * delta_ic)), where delta_ic is the difference between the model's IC value and the smallest IC value in the model set (Burnham & Anderson, 2002). For LOOIC, weights are computed as "stacking weights" using loo::stacking_weights().

Ranking Models

When rank = TRUE, a new column Performance_Score is returned. This score ranges from 0\ performance. Note that all score value do not necessarily sum up to 100\ Rather, calculation is based on normalizing all indices (i.e. rescaling them to a range from 0 to 1), and taking the mean value of all indices for each model. This is a rather quick heuristic, but might be helpful as exploratory index.

In particular when models are of different types (e.g. mixed models, classical linear models, logistic regression, ...), not all indices will be computed for each model. In case where an index can't be calculated for a specific model type, this model gets an NA value. All indices that have any NAs are excluded from calculating the performance score.

There is a plot()-method for compare_performance(), which creates a "spiderweb" plot, where the different indices are normalized and larger values indicate better model performance. Hence, points closer to the center indicate worse fit indices (see online-documentation for more details).

Note

There is also a plot()-method implemented in the see-package.

References

Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). Springer-Verlag. doi: 10.1007/b97636

Examples

data(iris)
lm1 <- lm(Sepal.Length ~ Species, data = iris)
lm2 <- lm(Sepal.Length ~ Species + Petal.Length, data = iris)
lm3 <- lm(Sepal.Length ~ Species * Petal.Length, data = iris)
compare_performance(lm1, lm2, lm3)
#> # Comparison of Model Performance Indices
#> 
#> Name | Model |     AIC |  AIC_wt |     BIC |  BIC_wt |    R2 | R2 (adj.) |  RMSE | Sigma
#> ----------------------------------------------------------------------------------------
#> lm1  |    lm | 231.452 | < 0.001 | 243.494 | < 0.001 | 0.619 |     0.614 | 0.510 | 0.515
#> lm2  |    lm | 106.233 |   0.566 | 121.286 |   0.964 | 0.837 |     0.833 | 0.333 | 0.338
#> lm3  |    lm | 106.767 |   0.434 | 127.842 |   0.036 | 0.840 |     0.835 | 0.330 | 0.336
compare_performance(lm1, lm2, lm3, rank = TRUE)
#> # Comparison of Model Performance Indices
#> 
#> Name | Model |    R2 | R2 (adj.) |  RMSE | Sigma |  AIC_wt |  BIC_wt | Performance-Score
#> ----------------------------------------------------------------------------------------
#> lm2  |    lm | 0.837 |     0.833 | 0.333 | 0.338 |   0.566 |   0.964 |            99.10%
#> lm3  |    lm | 0.840 |     0.835 | 0.330 | 0.336 |   0.434 |   0.036 |            80.05%
#> lm1  |    lm | 0.619 |     0.614 | 0.510 | 0.515 | < 0.001 | < 0.001 |             0.00%

if (require("lme4")) {
  m1 <- lm(mpg ~ wt + cyl, data = mtcars)
  m2 <- glm(vs ~ wt + mpg, data = mtcars, family = "binomial")
  m3 <- lmer(Petal.Length ~ Sepal.Length + (1 | Species), data = iris)
  compare_performance(m1, m2, m3)
}
#> Warning: When comparing models, please note that probably not all models were fit from
#>   same data.
#> # Comparison of Model Performance Indices
#> 
#> Name |   Model |     AIC |  AIC_wt |     BIC |  BIC_wt |  RMSE | Sigma |    R2 | R2 (adj.) | Tjur's R2 | Log_loss | Score_log | Score_spherical |   PCP | R2 (cond.) | R2 (marg.) |   ICC
#> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
#> m1   |      lm | 156.010 | < 0.001 | 161.873 | < 0.001 | 2.444 | 2.568 | 0.830 |     0.819 |           |          |           |                 |       |            |            |      
#> m2   |     glm |  31.298 |   1.000 |  35.695 |   1.000 | 0.359 | 0.934 |       |           |     0.478 |    0.395 |   -14.903 |           0.095 | 0.743 |            |            |      
#> m3   | lmerMod |  77.320 | < 0.001 |  89.362 | < 0.001 | 0.279 | 0.283 |       |           |           |          |           |                 |       |      0.972 |      0.096 | 0.969