Introduction

To make sense of their data and effects, scientists might want to standardize (Z-score) their variables. This makes the data unitless, expressed only in terms of deviation from an index of centrality (e.g., the mean or the median). However, aside from some benefits, standardization also comes with challenges and issues, that the scientist should be aware of.

Methods of Standardization

The effectsize package offers two methods of standardization via the standardize() function:

  • Normal standardization: center around the mean, with SD units (default).

  • Robust standardization: center around the median, with MAD units (robust = TRUE).

Let’s look at the following example:

library(effectsize)

data("hardlyworking")

head(hardlyworking)
>   salary xtra_hours n_comps age seniority
> 1  19745        4.2       1  32         3
> 2  11302        1.6       0  34         3
> 3  20636        1.2       3  33         5
> 4  23047        7.2       1  35         3
> 5  27342       11.3       0  33         4
> 6  25657        3.6       2  30         5
hardlyworking$xtra_hours_z <- standardize(hardlyworking$xtra_hours)

hardlyworking$xtra_hours_zr <- standardize(hardlyworking$xtra_hours, robust = TRUE)

We can see that different methods give different central and variation values:

name mean sd median mad
xtra_hours 3.98 3.9 2.77 2.85
xtra_hours_z 0.00 1.0 -0.31 0.73
xtra_hours_zr 0.42 1.4 0.00 1.00

standardize() can also be used to standardize a full data frame - where each numeric variable is standardized separately:

hardlyworking_z <- standardize(hardlyworking)
name mean sd median mad
age 0 1 0.060 1.25
n_comps 0 1 -0.061 1.80
salary 0 1 -0.162 0.86
seniority 0 1 -0.368 1.29
xtra_hours 0 1 -0.308 0.73

Weighted standardization is also supported via the weights argument, and factors can also be standardized (if you’re into that kind of thing) by setting force = TRUE, which converts factors to treatment-coded dummy variables before standardizing.

Variable-wise vs. Participant-wise

Standardization is an important step and extra caution is required in repeated-measures designs, in which there are three ways of standardizing data:

  • Variable-wise: The most common method. A simple scaling of each column.

  • Participant-wise: Variables are standardized “within” each participant, i.e., for each participant, by the participant’s mean and SD.

  • Full: Participant-wise first and then re-standardizing variable-wise.

Unfortunately, the method used is often not explicitly stated. This is an issue as these methods can generate important discrepancies (that can in turn contribute to the reproducibility crisis). Let’s investigate these 3 methods.

The Data

We will take the emotion dataset in which participants were exposed to negative pictures and had to rate their emotions (valence) and the amount of memories associated with the picture (autobiographical link). One could make the hypothesis that for young participants with no context of war or violence, the most negative pictures (mutilations) are less related to memories than less negative pictures (involving for example car crashes or sick people). In other words, we expect a positive relationship between valence (with high values corresponding to less negativity) and autobiographical link.

Let’s have a look at the data, averaged by participants:

library(dplyr)
library(tidyr)

# Download the 'emotion' dataset
load(url("https://raw.github.com/neuropsychology/psycho.R/master/data/emotion.rda"))

# Discard neutral pictures (keep only negative)
emotion <- emotion %>%
  filter(Emotion_Condition == "Negative")

# Summary
emotion %>%
  drop_na(Subjective_Valence, Autobiographical_Link) %>%
  group_by(Participant_ID) %>%
  summarise(
    n_Trials = n(),
    Valence_Mean = mean(Subjective_Valence),
    Valence_SD = sd(Subjective_Valence)
  )
> # A tibble: 19 x 4
>    Participant_ID n_Trials Valence_Mean Valence_SD
>    <fct>             <int>        <dbl>      <dbl>
>  1 10S                  24       -58.1       42.6 
>  2 11S                  24       -73.2       37.0 
>  3 12S                  24       -57.5       26.6 
>  4 13S                  24       -63.2       23.7 
>  5 14S                  24       -56.6       26.5 
>  6 15S                  24       -60.6       33.7 
>  7 16S                  24       -46.1       24.9 
>  8 17S                  24        -1.54       4.98
>  9 18S                  24       -67.2       35.0 
> 10 19S                  24       -59.6       33.2 
> 11 1S                   24       -53.0       42.9 
> 12 2S                   23       -43.0       39.2 
> 13 3S                   24       -64.3       34.4 
> 14 4S                   24       -81.6       27.6 
> 15 5S                   24       -58.1       25.3 
> 16 6S                   24       -74.7       29.2 
> 17 7S                   24       -62.3       39.7 
> 18 8S                   24       -56.9       32.7 
> 19 9S                   24       -31.5       52.7

As we can see from the means and SDs, there is a lot of variability between participants both in their means and their individual within-participant SD.

Effect of Standardization

We will create three data frames standardized with each of the three techniques.

Z_VariableWise <- emotion %>%
  standardize()

Z_ParticipantWise <- emotion %>%
  group_by(Participant_ID) %>%
  standardize()

Z_Full <- emotion %>%
  group_by(Participant_ID) %>%
  standardize() %>%
  ungroup() %>%
  standardize()

Let’s see how these three standardization techniques affected the Valence variable.

Across Participants

We can calculate the mean and SD of Valence across all participants:

DF Mean SD
Z_VariableWise 0 1.00
Z_ParticipantWise 0 0.98
Z_Full 0 1.00

The means and the SD appear as fairly similar (0 and 1)…

and so do the marginal distributions…

At the Participant Level

However we can also look at what happens in the participant level. Let’s look at the first 5 participants:

DF Participant_ID Mean SD
Z_VariableWise 10S -0.05 1.15
Z_VariableWise 11S -0.46 1.00
Z_VariableWise 12S -0.03 0.72
Z_VariableWise 13S -0.19 0.64
Z_VariableWise 14S -0.01 0.71
Z_ParticipantWise 10S 0.00 1.00
Z_ParticipantWise 11S 0.00 1.00
Z_ParticipantWise 12S 0.00 1.00
Z_ParticipantWise 13S 0.00 1.00
Z_ParticipantWise 14S 0.00 1.00
Z_Full 10S 0.00 1.02
Z_Full 11S 0.00 1.02
Z_Full 12S 0.00 1.02
Z_Full 13S 0.00 1.02
Z_Full 14S 0.00 1.02

Seems like full and participant-wise standardization give similar results, but different ones than variable-wise standardization.

Compare

Let’s do a correlation between the variable-wise and participant-wise methods.

While the three standardization methods roughly present the same characteristics at a general level (mean 0 and SD 1) and a similar distribution, their values are not exactly the same!

Let’s now answer to the original question by investigating the linear relationship between valence and autobiographical link. We can do this by running a mixed model with participants entered as random effects.

library(lme4)

m_raw <- lmer(Subjective_Valence ~ Autobiographical_Link + (1 | Participant_ID),
  data = emotion
)

m_VariableWise <- update(m_raw, data = Z_VariableWise)
m_ParticipantWise <- update(m_raw, data = Z_ParticipantWise)
m_Full <- update(m_raw, data = Z_Full)

We can extract the parameters of interest from each model, and find:

> # Fixed Effects
> 
> Model             | Coefficient |   SE |        95% CI | t(451) |     p
> -----------------------------------------------------------------------
> m_raw             |        0.09 | 0.07 | [-0.04, 0.22] |   1.36 | 0.173
> m_VariableWise    |        0.07 | 0.05 | [-0.03, 0.16] |   1.36 | 0.173
> m_ParticipantWise |        0.08 | 0.05 | [-0.01, 0.17] |   1.75 | 0.080
> m_Full            |        0.08 | 0.05 | [-0.01, 0.17] |   1.75 | 0.080
> 
> # Random Effects: Participant_ID
> 
> Model             | Coefficient |   CI
> --------------------------------------
> m_raw             |       16.49 | 0.95
> m_VariableWise    |        0.45 | 0.95
> m_ParticipantWise |        0.00 | 0.95
> m_Full            |        0.00 | 0.95
> 
> # Random Effects: Residual
> 
> Model             | Coefficient |   CI
> --------------------------------------
> m_raw             |        5.79 | 0.95
> m_VariableWise    |        0.95 | 0.95
> m_ParticipantWise |        0.99 | 0.95
> m_Full            |        1.00 | 0.95

As we can see, variable-wise standardization only affects the coefficient (which is expected, as it changes the unit), but not the test statistics or significance. However, using participant-wise standardization does affect the coefficient and the significance.. No method is better or more justified, and its choice depends on the specific case, context, data and goal.

Conclusion

  1. Standardization can be useful in some cases and should be justified

  2. Variable and Participant-wise standardization methods produce “in appearance” similar data

  3. Variable and Participant-wise standardization can lead to different results

  4. The choice of the method can strongly influence the results and thus, should be explicitly stated

We showed here yet another way of sneakily tweaking the data that can change the results. To prevent its use for bad practices (e.g., p-hacking), we can only support the generalization of open-data, open-analysis and preregistration**.

See also the parameters::demean() function, and the standardize_parameters(method = "pseudo") for more mixed-model tools.

References