Odds Ratio

This vignette needs some love.

Validate a t-test with Logistic Regression

Let’s start off with a simple example. We will simulate 100 observations of a normally distributed outcome variable (mean = 0 and SD = 1) and a grouping variable made of zeros and ones. Importantly, the mean difference between these two groups (zeros vs. ones) is of 1.

data <- bayestestR::simulate_difference(n = 100,
d = 1,
names = c("Group", "Outcome"))

summary(data)

>  Group     Outcome
>  0:50   Min.   :-2.55
>  1:50   1st Qu.:-0.74
>         Median : 0.00
>         Mean   : 0.00
>         3rd Qu.: 0.74
>         Max.   : 2.55

Now, as we are interested in the difference between these two groups, we can first investigate it using a t-test.

library(ggplot2)

ggplot(data, aes(x=Group, y=Outcome, fill=Group)) +
geom_boxplot() +
see::theme_modern()


ttest <- t.test(Outcome ~ Group, data=data, var.equal=TRUE)
ttest_pars <- parameters::parameters(ttest)
ttest_pars

> Parameter | Group | Mean_Group1 | Mean_Group2 | Difference |     t | df |      p |         95% CI |             Method
> ----------------------------------------------------------------------------------------------------------------------
> Outcome   | Group |       -0.50 |        0.50 |       1.00 | -5.32 | 98 | < .001 | [-1.37, -0.63] |  Two Sample t-test

As we can see, this confirms our simulation specifications, the difference is indeed of 1.

Cohen’s d

Let’s compute now, a traditional Cohen’s d using the effectsize package. While this d should be close to 1, it should theoretically be a tiny bit larger, because it takes into account the (pooled) SD of the whole variable x (across the groups), which because of the difference is a bit larger than 1.

sd(data$Outcome)  > [1] 1.1 We can compute the Cohen’s d as follows: effectsize::cohens_d(data$Outcome, data$Group)  > Cohen's d | 1e+02% CI > -------------------------- > -1.06 | [-1.48, -0.64] As expected, it’s pretty close to 1 times the SD of the sample. Interestingly, one can estimate the Cohen’s d directly from the result of the t-test, using the t statistic. We can convert it to a d using the effectsize package: effectsize::t_to_d(ttest_pars$t, df_error = ttest_pars\$df)

>     d |      1e+02% CI
> ----------------------
> -1.07 | [-1.50, -0.65]

Fortunately, they are quite close.

The Complimentry Logistic Model

Another way of investigating these differences is through the lens of a logistic regression. The main difference is that here, the group variable y becomes the outcome and x the predictor. Let’s fit such model and investigate the parameters:

model <- glm(Group ~ Outcome, data = data,
family = "binomial")

parameters::parameters(model)

> Parameter   | Coefficient |   SE |        95% CI |        z |      p
> --------------------------------------------------------------------
> (Intercept) |    1.23e-16 | 0.23 | [-0.45, 0.45] | 5.41e-16 | > .999
> Outcome     |        1.13 | 0.27 | [ 0.65, 1.70] |     4.25 | < .001

How to interpret this output? The coefficients of a logistic model are expressed in log-odds, which is a metric of probability. Using the modelbased package, one can easily visualize this model:

data_grid <- modelbased::estimate_link(model)

ggplot(data_grid, aes(x = Outcome, y = Predicted)) +
geom_ribbon(aes(ymin = CI_low, ymax = CI_high), alpha = 0.2) +
geom_line(color = "red", size = 1) +
see::theme_modern()


We can see that the probability of y being 1 (vs. 0) increases as x increases. This is another way of saying that there is a difference of x between the two groups of y. We can visualize all of our this together as follows:

ggplot(data, aes(x=Group, y=Outcome, fill=Group)) +
geom_boxplot() +
geom_jitter(width = 0.1) +

You can notice that the red predicted probability line passes through x=0 when y=0.5. This means that when x=0, the probability of the two groups is equal: it is the “middle” of the difference between them.