The basic idea of bootstrapping is that inference about a parent population from sample data can be modelled by resampling the sample data. It is often used as an alternative to statistical inference based on the assumption of a parametric model when that assumption is in doubt, or where parametric inference is complicated.

Comparison between regular and boostrapped estimates

Data

In order to compare point-estimates with bootstrapped parameters for frequentist models. We generated one large sample (the parent population, size 1000000) of two continuous variables producing a regression coefficient of 0.5. We then iteratively extracted a subsample of size 30, computed 3 types of coefficient (regular, bootstrapped median with 1000 and 4000 iterations) that were substracted from the “parent” coefficient. The closer the value is from 0, and the closer it is from the “true” effect.

The data is available on githuband the code to generate it is available here.

library(ggplot2)
library(dplyr)
library(tidyr)

df <- read.csv("https://raw.github.com/easystats/circus/master/data/bootstrapped.csv")

Testing

Bayes factor analysis

library(BayesFactor)
library(bayestestR)

bayestestR::bayesfactor(BayesFactor::ttestBF(df$Coefficient))
> # Bayes Factors for Model Comparison
> 
>   Model               BF
>   [2] Alt., r=0.707 0.02
> 
> * Against Denominator: [1] Null, mu=0
> *   Bayes Factor Type: JZS (BayesFactor)
bayestestR::bayesfactor(BayesFactor::ttestBF(df$Bootstrapped_1000))
> # Bayes Factors for Model Comparison
> 
>   Model               BF
>   [2] Alt., r=0.707 0.02
> 
> * Against Denominator: [1] Null, mu=0
> *   Bayes Factor Type: JZS (BayesFactor)
bayestestR::bayesfactor(BayesFactor::ttestBF(df$Bootstrapped_4000))
> # Bayes Factors for Model Comparison
> 
>   Model               BF
>   [2] Alt., r=0.707 0.02
> 
> * Against Denominator: [1] Null, mu=0
> *   Bayes Factor Type: JZS (BayesFactor)

Contrast analysis

library(emmeans)

lm(Distance ~ Type, data = df_long) %>% 
  emmeans::emmeans(~Type)
>  Type               emmean      SE    df lower.CL upper.CL
>  Coefficient       0.00090 0.00097 29997 -0.00100  0.00279
>  Bootstrapped_1000 0.00089 0.00097 29997 -0.00101  0.00278
>  Bootstrapped_4000 0.00091 0.00097 29997 -0.00098  0.00281
> 
> Confidence level used: 0.95

Conclusion

Negligible difference, but bootstrapped (n=1000) seems (very) slightly more accurate.