Skip to contents

What is the R2?

The coefficient of determination, denoted \(R^2\) and pronounced “R squared”, typically corresponds the proportion of the variance in the dependent variable (the response) that is explained (i.e., predicted) by the independent variables (the predictors).

It is an “absolute” index of goodness-of-fit, ranging from 0 to 1 (often expressed in percentage), and can be used for model performance assessment or models comparison.

Different types of R2

As models become more complex, the computation of an \(R^2\) becomes increasingly less straightforward.

Currently, depending on the context of the regression model object, one can choose from the following measures supported in performance:

  • Bayesian \(R^2\)
  • Cox & Snell’s \(R^2\)
  • Efron’s \(R^2\)
  • Kullback-Leibler \(R^2\)
  • LOO-adjusted \(R^2\)
  • McFadden’s \(R^2\)
  • McKelvey & Zavoinas \(R^2\)
  • Nagelkerke’s \(R^2\)
  • Nakagawa’s \(R^2\) for mixed models
  • Somers’ \(D_{xy}\) rank correlation for binary outcomes
  • Tjur’s \(R^2\) - coefficient of determination (D)
  • Xu’ \(R^2\) (Omega-squared)
  • \(R^2\) for models with zero-inflation

TO BE COMPLETED.

Before we begin, let’s first load the package.

R2 for lm

m_lm <- lm(wt ~ am * cyl, data = mtcars)

r2(m_lm)
> # R2 for Linear Regression
>        R2: 0.724
>   adj. R2: 0.694

R2 for glm

In the context of a generalized linear model (e.g., a logistic model which outcome is binary), \(R^2\) doesn’t measure the percentage of “explained variance”, as this concept doesn’t apply. However, the \(R^2\)s that have been adapted for GLMs have retained the name of “R2”, mostly because of the similar properties (the range, the sensitivity, and the interpretation as the amount of explanatory power).

R2 for Mixed Models

Marginal vs. Conditional R2

For mixed models, performance will return two different \(R^2\)s:

  • The conditional \(R^2\)
  • The marginal \(R^2\)

The marginal \(R^2\) considers only the variance of the fixed effects (without the random effects), while the conditional \(R^2\) takes both the fixed and random effects into account (i.e., the total model).

library(lme4)

# defining a linear mixed-effects model
model <- lmer(Petal.Length ~ Petal.Width + (1 | Species), data = iris)

r2(model)
> # R2 for Mixed Models
> 
>   Conditional R2: 0.933
>      Marginal R2: 0.303

Note that r2 functions only return the \(R^2\) values. We would encourage users to instead always use the model_performance function to get a more comprehensive set of indices of model fit.

model_performance(model)
> # Indices of model performance
> 
> AIC     |    AICc |     BIC | R2 (cond.) | R2 (marg.) |   ICC |  RMSE | Sigma
> -----------------------------------------------------------------------------
> 159.036 | 159.312 | 171.079 |      0.933 |      0.303 | 0.904 | 0.373 | 0.378

But, in the current vignette, we would like to exclusively focus on this family of functions and will only talk about this measure.

R2 for Bayesian Models

library(rstanarm)

model <- stan_glm(mpg ~ wt + cyl, data = mtcars, refresh = 0)
r2(model)
> # Bayesian R2 with Compatibility Interval
> 
>   Conditional R2: 0.816 (95% CI [0.704, 0.897])

As discussed above, for mixed-effects models, there will be two components associated with \(R^2\).

# defining a Bayesian mixed-effects model
model <- stan_lmer(Petal.Length ~ Petal.Width + (1 | Species), data = iris, refresh = 0)

r2(model)
> # Bayesian R2 with Compatibility Interval
> 
>   Conditional R2: 0.953 (95% CI [0.941, 0.963])
>      Marginal R2: 0.824 (95% CI [0.717, 0.894])

Comparing change in R2 using Cohen’s f

Cohen’s \(f\) (of ANOVA fame) can be used as a measure of effect size in the context of sequential multiple regression (i.e., nested models). That is, when comparing two models, we can examine the ratio between the increase in \(R^2\) and the unexplained variance:

\[ f^{2}={R_{AB}^{2}-R_{A}^{2} \over 1-R_{AB}^{2}} \]

library(effectsize)
data(hardlyworking)
m1 <- lm(salary ~ xtra_hours, data = hardlyworking)
m2 <- lm(salary ~ xtra_hours + n_comps + seniority, data = hardlyworking)

cohens_f_squared(m1, model2 = m2)
> Cohen's f2 (partial) |      95% CI | R2_delta
> ---------------------------------------------
> 1.19                 | [0.99, Inf] |     0.17
> 
> - One-sided CIs: upper bound fixed at [Inf].

If you want to know more about these indices, you can check out details and references in the functions that compute them here.

Interpretation

If you want to know about how to interpret these \(R^2\) values, see these interpretation guidelines.