R/test_bf.R
, R/test_likelihoodratio.R
, R/test_performance.R
, and 2 more
test_performance.Rd
Testing whether models are "different" in terms of accuracy or explanatory power is a delicate and often complex
procedure, with many limitations and prerequisites. Moreover, many tests exist, each coming with its own interpretation, and set of strengths and weaknesses.
The test_performance()
function runs the most relevant and appropriate tests based on the type of input (for instance, whether the models are nested or not). However, it still requires the user to understand what the tests are and what they do in order to prevent their misinterpretation. See the details section for more information regarding the different tests and their interpretation.
test_bf(...)
# S3 method for default
test_bf(..., text_length = NULL)
test_likelihoodratio(..., estimator = "ML")
performance_lrt(..., estimator = "ML")
test_lrt(..., estimator = "ML")
test_performance(..., reference = 1)
test_vuong(...)
test_wald(...)
...  Multiple model objects. 

text_length  Numeric, length (number of chars) of output lines.

estimator  Applied when comparing regression models using

reference  This only applies when models are nonnested, and determines which model should be taken as a reference, against which all the other models are tested. 
A data frame containing the relevant indices.
Model's "nesting" is an important concept of models comparison. Indeed, many
tests only make sense when the models are "nested", i.e., when their
predictors are nested. This means that all the predictors of a model are
contained within the predictors of a larger model (sometimes referred to as
the encompassing model). For instance, model1 (y ~ x1 + x2)
is
"nested" within model2 (y ~ x1 + x2 + x3)
. Usually, people have a list
of nested models, for instance m1 (y ~ 1)
, m2 (y ~ x1)
,
m3 (y ~ x1 + x2)
, m4 (y ~ x1 + x2 + x3)
, and it is conventional
that they are "ordered" from the smallest to largest, but it is up to the
user to reverse the order from largest to smallest. The test then shows
whether a more parsimonious model, or whether adding a predictor, results in
a significant difference in the model's performance. In this case, models are
usually compared sequentially: m2 is tested against m1, m3 against m2,
m4 against m3, etc.
Two models are considered as "nonnested" if their predictors are
different. For instance, model1 (y ~ x1 + x2)
and `model2 (y ~ x3
x4). In the case of nonnested models, all models are usually compared against the same *reference* model (by default, the first of the list). \cr\cr Nesting is detected via the
insight::is_nested_models()` function.
Note that, apart from the nesting, in order for the tests to be valid,
other requirements have often to be the fulfilled. For instance, outcome
variables (the response) must be the same. You cannot meaningfully test
whether apples are significantly different from oranges!
Bayes factor for Model Comparison  test_bf()
: If all
models were fit from the same data, the returned BF
shows the Bayes
Factor (see bayestestR::bayesfactor_models()
) for each model against
the reference model (which depends on whether the models are nested or
not). Check out
this vignette for more details.
Wald's FTest  test_wald()
: The Wald test is a rough
approximation of the Likelihood Ratio Test. However, it is more applicable
than the LRT: you can often run a Wald test in situations where no other
test can be run. Importantly, this test only makes statistical sense if the
models are nested.
Note: this test is also available in base R through the
anova()
function. It returns an Fvalue
column
as a statistic and its associated pvalue
.
Likelihood Ratio Test (LRT)  test_likelihoodratio()
:
The LRT tests which model is a better (more likely) explanation of the
data. LikelihoodRatioTest (LRT) gives usually somewhat close results (if
not equivalent) to the Wald test and, similarly, only makes sense for
nested models. However, Maximum likelihood tests make stronger assumptions
than method of moments tests like the Ftest, and in turn are more
efficient. Agresti (1990) suggests that you should use the LRT instead of
the Wald test for small sample sizes (under or about 30) or if the
parameters are large.
Note: for regression models, this is similar to
anova(..., test="LRT")
(on models) or lmtest::lrtest(...)
,
depending on the estimator
argument. For lavaan
models (SEM,
CFA), the function calls lavaan::lavTestLRT()
.
Vuong's Test  test_vuong()
: Vuong's (1989) test can
be used both for nested and nonnested models, and actually consists of two
tests.
The Test of Distinguishability (the Omega2
column and
its associated pvalue) indicates whether or not the models can possibly be
distinguished on the basis of the observed data. If its pvalue is
significant, it means the models are distinguishable.
The Robust Likelihood Test (the LR
column and its
associated pvalue) indicates whether each model fits better than the
reference model. If the models are nested, then the test works as a robust
LRT. The code for this function is adapted from the nonnest2
package, and all credit go to their authors.
Vuong, Q. H. (1989). Likelihood ratio tests for model selection and nonnested hypotheses. Econometrica, 57, 307333.
Merkle, E. C., You, D., & Preacher, K. (2016). Testing nonnested structural equation models. Psychological Methods, 21, 151163.
compare_performance()
to compare
the performance indices of many different models.
# Nested Models
# 
m1 < lm(Sepal.Length ~ Petal.Width, data = iris)
m2 < lm(Sepal.Length ~ Petal.Width + Species, data = iris)
m3 < lm(Sepal.Length ~ Petal.Width * Species, data = iris)
test_performance(m1, m2, m3)
#> Name  Model  BF  Omega2  p (Omega2)  LR  p (LR)
#> 
#> m1  lm     
#> m2  lm  0.007  9.54e04  0.935  0.15  0.919
#> m3  lm  0.037  0.02  0.081  3.41  0.099
#> Models were detected as nested and are compared in sequential order.
test_bf(m1, m2, m3)
#> Bayes Factors for Model Comparison
#>
#> Model BF
#> [m2] Petal.Width + Species 0.007
#> [m3] Petal.Width * Species 2.64e04
#>
#> * Against Denominator: [m1] Petal.Width
#> * Bayes Factor Type: BIC approximation
test_wald(m1, m2, m3) # Equivalent to anova(m1, m2, m3)
#> Name  Model  df  df_diff  F  p
#> 
#> m1  lm  148   
#> m2  lm  146  2.00  0.08  0.927
#> m3  lm  144  2.00  1.66  0.195
#> Models were detected as nested and are compared in sequential order.
# Equivalent to lmtest::lrtest(m1, m2, m3)
test_likelihoodratio(m1, m2, m3, estimator = "ML")
#> # LikelihoodRatioTest (LRT) for Model Comparison
#>
#> Name  Model  df  df_diff  Chi2  p
#> 
#> m1  lm  3   
#> m2  lm  5  2  0.15  0.926
#> m3  lm  7  2  3.41  0.182
# Equivalent to anova(m1, m2, m3, test='LRT')
test_likelihoodratio(m1, m2, m3, estimator = "OLS")
#> # LikelihoodRatioTest (LRT) for Model Comparison
#>
#> Name  Model  df  df_diff  Chi2  p
#> 
#> m1  lm  3   
#> m2  lm  5  2  0.15  0.927
#> m3  lm  7  2  3.31  0.191
test_vuong(m1, m2, m3) # nonnest2::vuongtest(m1, m2, nested=TRUE)
#> Name  Model  Omega2  p (Omega2)  LR  p (LR)
#> 
#> m1  lm    
#> m2  lm  9.54e04  0.935  0.15  0.919
#> m3  lm  0.02  0.081  3.41  0.099
#> Models were detected as nested and are compared in sequential order.
# Nonnested Models
# 
m1 < lm(Sepal.Length ~ Petal.Width, data = iris)
m2 < lm(Sepal.Length ~ Petal.Length, data = iris)
m3 < lm(Sepal.Length ~ Species, data = iris)
test_performance(m1, m2, m3)
#> Name  Model  BF  Omega2  p (Omega2)  LR  p (LR)
#> 
#> m1  lm     
#> m2  lm  > 1000  0.19  < .001  4.57  < .001
#> m3  lm  < 0.001  0.12  < .001  2.51  0.006
#> Each model is compared to m1.
test_bf(m1, m2, m3)
#> Bayes Factors for Model Comparison
#>
#> Model BF
#> [m2] Petal.Length 2.90e+10
#> [m3] Species 2.00e06
#>
#> * Against Denominator: [m1] Petal.Width
#> * Bayes Factor Type: BIC approximation
test_vuong(m1, m2, m3) # nonnest2::vuongtest(m1, m2)
#> Name  Model  Omega2  p (Omega2)  LR  p (LR)
#> 
#> m1  lm    
#> m2  lm  0.19  < .001  4.57  < .001
#> m3  lm  0.12  < .001  2.51  0.006
#> Each model is compared to m1.
# Tweak the output
# 
test_performance(m1, m2, m3, include_formula = TRUE)
#> Name  Model  BF  Omega2  p (Omega2)  LR  p (LR)
#> 
#> m1  lm(Sepal.Length ~ Petal.Width)     
#> m2  lm(Sepal.Length ~ Petal.Length)  > 1000  0.19  < .001  4.57  < .001
#> m3  lm(Sepal.Length ~ Species)  < 0.001  0.12  < .001  2.51  0.006
#> Each model is compared to m1.
# SEM / CFA (lavaan objects)
# 
# Lavaan Models
if (require("lavaan")) {
structure < " visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
visual ~~ textual + speed "
m1 < lavaan::cfa(structure, data = HolzingerSwineford1939)
structure < " visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
visual ~~ 0 * textual + speed "
m2 < lavaan::cfa(structure, data = HolzingerSwineford1939)
structure < " visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
visual ~~ 0 * textual + 0 * speed "
m3 < lavaan::cfa(structure, data = HolzingerSwineford1939)
test_likelihoodratio(m1, m2, m3)
# Different Model Types
# 
if (require("lme4") && require("mgcv")) {
m1 < lm(Sepal.Length ~ Petal.Length + Species, data = iris)
m2 < lmer(Sepal.Length ~ Petal.Length + (1  Species), data = iris)
m3 < gam(Sepal.Length ~ s(Petal.Length, by = Species) + Species, data = iris)
test_performance(m1, m2, m3)
}
}
#> Loading required package: mgcv
#> This is mgcv 1.836. For overview type 'help("mgcvpackage")'.
#> Name  Model  BF
#> 
#> m1  lm 
#> m2  lmerMod  < 0.001
#> m3  gam  0.038
#> Each model is compared to m1.