# Test if Models are Different

Source:`R/test_bf.R`

, `R/test_likelihoodratio.R`

, `R/test_performance.R`

, and 2 more
`test_performance.Rd`

Testing whether models are "different" in terms of accuracy or explanatory
power is a delicate and often complex procedure, with many limitations and
prerequisites. Moreover, many tests exist, each coming with its own
interpretation, and set of strengths and weaknesses.

The `test_performance()`

function runs the most relevant and appropriate
tests based on the type of input (for instance, whether the models are
*nested* or not). However, it still requires the user to understand what the
tests are and what they do in order to prevent their misinterpretation. See
the **details** section for more information regarding the different tests
and their interpretation.

## Usage

```
test_bf(...)
# S3 method for default
test_bf(..., text_length = NULL)
test_likelihoodratio(..., estimator = "ML")
performance_lrt(..., estimator = "ML")
test_lrt(..., estimator = "ML")
test_performance(..., reference = 1)
test_vuong(...)
test_wald(...)
```

## Arguments

- ...
Multiple model objects.

- text_length
Numeric, length (number of chars) of output lines.

`test_bf()`

describes models by their formulas, which can lead to overly long lines in the output.`text_length`

fixes the length of lines to a specified limit.- estimator
Applied when comparing regression models using

`test_likelihoodratio()`

. Corresponds to the different estimators for the standard deviation of the errors. If`estimator="OLS"`

(default), then it uses the same method as`anova(..., test="LRT")`

implemented in base R, i.e., scaling by n-k (the unbiased OLS estimator) and using this estimator under the alternative hypothesis. If`estimator="ML"`

, which is for instance used by`lrtest(...)`

in package lmtest, the scaling is done by n (the biased ML estimator) and the estimator under the null hypothesis. In moderately large samples, the differences should be negligible, but it is possible that OLS would perform slightly better in small samples with Gaussian errors.- reference
This only applies when models are non-nested, and determines which model should be taken as a reference, against which all the other models are tested.

## Details

### Nested vs. Non-nested Models

Model's "nesting" is an important concept of models comparison. Indeed, many
tests only make sense when the models are *"nested",* i.e., when their
predictors are nested. This means that all the predictors of a model are
contained within the predictors of a larger model (sometimes referred to as
the encompassing model). For instance, `model1 (y ~ x1 + x2)`

is
"nested" within `model2 (y ~ x1 + x2 + x3)`

. Usually, people have a list
of nested models, for instance `m1 (y ~ 1)`

, `m2 (y ~ x1)`

,
`m3 (y ~ x1 + x2)`

, `m4 (y ~ x1 + x2 + x3)`

, and it is conventional
that they are "ordered" from the smallest to largest, but it is up to the
user to reverse the order from largest to smallest. The test then shows
whether a more parsimonious model, or whether adding a predictor, results in
a significant difference in the model's performance. In this case, models are
usually compared *sequentially*: m2 is tested against m1, m3 against m2,
m4 against m3, etc.

Two models are considered as *"non-nested"* if their predictors are
different. For instance, `model1 (y ~ x1 + x2)`

and `model2 (y ~ x3

x4)

`. In the case of non-nested models, all models are usually compared against the same *reference* model (by default, the first of the list). \cr\cr Nesting is detected via the`

insight::is_nested_models()` function. Note that, apart from the nesting, in order for the tests to be valid, other requirements have often to be the fulfilled. For instance, outcome variables (the response) must be the same. You cannot meaningfully test whether apples are significantly different from oranges!

### Tests Description

**Bayes factor for Model Comparison**-`test_bf()`

: If all models were fit from the same data, the returned`BF`

shows the Bayes Factor (see`bayestestR::bayesfactor_models()`

) for each model against the reference model (which depends on whether the models are nested or not). Check out this vignette for more details.**Wald's F-Test**-`test_wald()`

: The Wald test is a rough approximation of the Likelihood Ratio Test. However, it is more applicable than the LRT: you can often run a Wald test in situations where no other test can be run. Importantly, this test only makes statistical sense if the models are nested.

Note: this test is also available in base R through the`anova()`

function. It returns an`F-value`

column as a statistic and its associated`p-value`

.**Likelihood Ratio Test (LRT)**-`test_likelihoodratio()`

: The LRT tests which model is a better (more likely) explanation of the data. Likelihood-Ratio-Test (LRT) gives usually somewhat close results (if not equivalent) to the Wald test and, similarly, only makes sense for nested models. However, Maximum likelihood tests make stronger assumptions than method of moments tests like the F-test, and in turn are more efficient. Agresti (1990) suggests that you should use the LRT instead of the Wald test for small sample sizes (under or about 30) or if the parameters are large.

Note: for regression models, this is similar to`anova(..., test="LRT")`

(on models) or`lmtest::lrtest(...)`

, depending on the`estimator`

argument. For`lavaan`

models (SEM, CFA), the function calls`lavaan::lavTestLRT()`

.**Vuong's Test**-`test_vuong()`

: Vuong's (1989) test can be used both for nested and non-nested models, and actually consists of two tests.The

**Test of Distinguishability**(the`Omega2`

column and its associated p-value) indicates whether or not the models can possibly be distinguished on the basis of the observed data. If its p-value is significant, it means the models are distinguishable.The

**Robust Likelihood Test**(the`LR`

column and its associated p-value) indicates whether each model fits better than the reference model. If the models are nested, then the test works as a robust LRT. The code for this function is adapted from the`nonnest2`

package, and all credit go to their authors.

## References

Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57, 307-333.

Merkle, E. C., You, D., & Preacher, K. (2016). Testing non-nested structural equation models. Psychological Methods, 21, 151-163.

## See also

`compare_performance()`

to compare
the performance indices of many different models.

## Examples

```
# Nested Models
# -------------
m1 <- lm(Sepal.Length ~ Petal.Width, data = iris)
m2 <- lm(Sepal.Length ~ Petal.Width + Species, data = iris)
m3 <- lm(Sepal.Length ~ Petal.Width * Species, data = iris)
test_performance(m1, m2, m3)
#> Name | Model | BF | Omega2 | p (Omega2) | LR | p (LR)
#> ------------------------------------------------------------
#> m1 | lm | | | | |
#> m2 | lm | 0.007 | 9.54e-04 | 0.935 | 0.15 | 0.919
#> m3 | lm | 0.037 | 0.02 | 0.081 | 3.41 | 0.099
#> Models were detected as nested and are compared in sequential order.
test_bf(m1, m2, m3)
#> Bayes Factors for Model Comparison
#>
#> Model BF
#> [m2] Petal.Width + Species 0.007
#> [m3] Petal.Width * Species 2.64e-04
#>
#> * Against Denominator: [m1] Petal.Width
#> * Bayes Factor Type: BIC approximation
test_wald(m1, m2, m3) # Equivalent to anova(m1, m2, m3)
#> Name | Model | df | df_diff | F | p
#> -------------------------------------------
#> m1 | lm | 148 | | |
#> m2 | lm | 146 | 2.00 | 0.08 | 0.927
#> m3 | lm | 144 | 2.00 | 1.66 | 0.195
#> Models were detected as nested and are compared in sequential order.
# Equivalent to lmtest::lrtest(m1, m2, m3)
test_likelihoodratio(m1, m2, m3, estimator = "ML")
#> # Likelihood-Ratio-Test (LRT) for Model Comparison
#>
#> Name | Model | df | df_diff | Chi2 | p
#> ------------------------------------------
#> m1 | lm | 3 | | |
#> m2 | lm | 5 | 2 | 0.15 | 0.926
#> m3 | lm | 7 | 2 | 3.41 | 0.182
# Equivalent to anova(m1, m2, m3, test='LRT')
test_likelihoodratio(m1, m2, m3, estimator = "OLS")
#> # Likelihood-Ratio-Test (LRT) for Model Comparison
#>
#> Name | Model | df | df_diff | Chi2 | p
#> ------------------------------------------
#> m1 | lm | 3 | | |
#> m2 | lm | 5 | 2 | 0.15 | 0.927
#> m3 | lm | 7 | 2 | 3.31 | 0.191
test_vuong(m1, m2, m3) # nonnest2::vuongtest(m1, m2, nested=TRUE)
#> Name | Model | Omega2 | p (Omega2) | LR | p (LR)
#> ----------------------------------------------------
#> m1 | lm | | | |
#> m2 | lm | 9.54e-04 | 0.935 | 0.15 | 0.919
#> m3 | lm | 0.02 | 0.081 | 3.41 | 0.099
#> Models were detected as nested and are compared in sequential order.
# Non-nested Models
# -----------------
m1 <- lm(Sepal.Length ~ Petal.Width, data = iris)
m2 <- lm(Sepal.Length ~ Petal.Length, data = iris)
m3 <- lm(Sepal.Length ~ Species, data = iris)
test_performance(m1, m2, m3)
#> Name | Model | BF | Omega2 | p (Omega2) | LR | p (LR)
#> -------------------------------------------------------------
#> m1 | lm | | | | |
#> m2 | lm | > 1000 | 0.19 | < .001 | -4.57 | < .001
#> m3 | lm | < 0.001 | 0.12 | < .001 | 2.51 | 0.006
#> Each model is compared to m1.
test_bf(m1, m2, m3)
#> Bayes Factors for Model Comparison
#>
#> Model BF
#> [m2] Petal.Length 2.90e+10
#> [m3] Species 2.00e-06
#>
#> * Against Denominator: [m1] Petal.Width
#> * Bayes Factor Type: BIC approximation
test_vuong(m1, m2, m3) # nonnest2::vuongtest(m1, m2)
#> Name | Model | Omega2 | p (Omega2) | LR | p (LR)
#> ---------------------------------------------------
#> m1 | lm | | | |
#> m2 | lm | 0.19 | < .001 | -4.57 | < .001
#> m3 | lm | 0.12 | < .001 | 2.51 | 0.006
#> Each model is compared to m1.
# Tweak the output
# ----------------
test_performance(m1, m2, m3, include_formula = TRUE)
#> Name | Model | BF | Omega2 | p (Omega2) | LR | p (LR)
#> ---------------------------------------------------------------------------------------
#> m1 | lm(Sepal.Length ~ Petal.Width) | | | | |
#> m2 | lm(Sepal.Length ~ Petal.Length) | > 1000 | 0.19 | < .001 | -4.57 | < .001
#> m3 | lm(Sepal.Length ~ Species) | < 0.001 | 0.12 | < .001 | 2.51 | 0.006
#> Each model is compared to m1.
# SEM / CFA (lavaan objects)
# --------------------------
# Lavaan Models
if (require("lavaan")) {
structure <- " visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
visual ~~ textual + speed "
m1 <- lavaan::cfa(structure, data = HolzingerSwineford1939)
structure <- " visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
visual ~~ 0 * textual + speed "
m2 <- lavaan::cfa(structure, data = HolzingerSwineford1939)
structure <- " visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
visual ~~ 0 * textual + 0 * speed "
m3 <- lavaan::cfa(structure, data = HolzingerSwineford1939)
test_likelihoodratio(m1, m2, m3)
# Different Model Types
# ---------------------
if (require("lme4") && require("mgcv")) {
m1 <- lm(Sepal.Length ~ Petal.Length + Species, data = iris)
m2 <- lmer(Sepal.Length ~ Petal.Length + (1 | Species), data = iris)
m3 <- gam(Sepal.Length ~ s(Petal.Length, by = Species) + Species, data = iris)
test_performance(m1, m2, m3)
}
}
#> Loading required package: mgcv
#> This is mgcv 1.8-38. For overview type 'help("mgcv-package")'.
#> Name | Model | BF
#> ------------------------
#> m1 | lm |
#> m2 | lmerMod | < 0.001
#> m3 | gam | 0.038
#> Each model is compared to m1.
```