What is the R2?
The coefficient of determination, denoted R^2 and pronounced “R squared”, typically corresponds the proportion of the variance in the dependent variable (the response) that is explained (i.e., predicted) by the independent variables (the predictors).
It is an “absolute” index of goodness-of-fit, ranging from 0 to 1 (often expressed in percentage), and can be used for model performance assessment or models comparison.
Different types of R2
As models become more complex, the computation of an R^2 becomes increasingly less straightforward.
Currently, depending on the context of the regression model object, one can choose from the following measures supported in performance:
- Bayesian R^2
- Cox & Snell’s R^2
- Efron’s R^2
- Kullback-Leibler R^2
- LOO-adjusted R^2
- McFadden’s R^2
- McKelvey & Zavoinas R^2
- Nagelkerke’s R^2
- Nakagawa’s R^2 for mixed models
- Somers’ D_{xy} rank correlation for binary outcomes
- Tjur’s R^2 - coefficient of determination (D)
- Xu’ R^2 (Omega-squared)
- R^2 for models with zero-inflation
TO BE COMPLETED.
Before we begin, let’s first load the package.
R2 for glm
In the context of a generalized linear model (e.g., a logistic model which outcome is binary), R^2 doesn’t measure the percentage of “explained variance”, as this concept doesn’t apply. However, the R^2s that have been adapted for GLMs have retained the name of “R2”, mostly because of the similar properties (the range, the sensitivity, and the interpretation as the amount of explanatory power).
R2 for Mixed Models
Marginal vs. Conditional R2
For mixed models, performance
will return two different
R^2s:
- The conditional R^2
- The marginal R^2
The marginal R^2 considers only the variance of the fixed effects (without the random effects), while the conditional R^2 takes both the fixed and random effects into account (i.e., the total model).
library(lme4)
# defining a linear mixed-effects model
model <- lmer(Petal.Length ~ Petal.Width + (1 | Species), data = iris)
r2(model)
> # R2 for Mixed Models
>
> Conditional R2: 0.933
> Marginal R2: 0.303
Note that r2
functions only return the R^2 values. We would encourage users to
instead always use the model_performance
function to get a
more comprehensive set of indices of model fit.
model_performance(model)
> # Indices of model performance
>
> AIC | AICc | BIC | R2 (cond.) | R2 (marg.) | ICC | RMSE | Sigma
> -----------------------------------------------------------------------------
> 159.036 | 159.312 | 171.079 | 0.933 | 0.303 | 0.904 | 0.373 | 0.378
But, in the current vignette, we would like to exclusively focus on this family of functions and will only talk about this measure.
R2 for Bayesian Models
library(rstanarm)
model <- stan_glm(mpg ~ wt + cyl, data = mtcars, refresh = 0)
r2(model)
> # Bayesian R2 with Compatibility Interval
>
> Conditional R2: 0.816 (95% CI [0.704, 0.897])
As discussed above, for mixed-effects models, there will be two components associated with R^2.
Comparing change in R2 using Cohen’s f
Cohen’s f (of ANOVA fame) can be used as a measure of effect size in the context of sequential multiple regression (i.e., nested models). That is, when comparing two models, we can examine the ratio between the increase in R^2 and the unexplained variance:
f^{2}={R_{AB}^{2}-R_{A}^{2} \over 1-R_{AB}^{2}}
library(effectsize)
data(hardlyworking)
m1 <- lm(salary ~ xtra_hours, data = hardlyworking)
m2 <- lm(salary ~ xtra_hours + n_comps + seniority, data = hardlyworking)
cohens_f_squared(m1, model2 = m2)
> Cohen's f2 (partial) | 95% CI | R2_delta
> ---------------------------------------------
> 1.19 | [0.99, Inf] | 0.17
>
> - One-sided CIs: upper bound fixed at [Inf].
If you want to know more about these indices, you can check out details and references in the functions that compute them here.
Interpretation
If you want to know about how to interpret these R^2 values, see these interpretation guidelines.