Although the easystats packages, such as effectsize and modelbased, make it easier to assess, investigate and interpret complex models with non-linear relationships. However, transforming variables can still be useful in some instances.

In the following article, we will demonstrate and discuss some of the possible and commonly used transformations that you can consider applying to your variables. They can usually be grouped into two categories: transformations that changes only the scale, and transformation that changes also the shape.

library(datawizard)
library(ggplot2)

## Scale Transformations

Standardization (also known as Z-scoring) refers to the centering and scaling of variable so that it is expressed in terms of deviation from some centrality index. Usually it is the mean and SD (where 0 corresponds to mean and the values represent deviations in SD from it), but it could be other indices, such as the median and the MAD. In R, it can be achieved via the standardize() function in the datawizard package.

datawizard::standardize(c(4, 5, 2, 4, 42))
> [1] -0.43 -0.37 -0.55 -0.43  1.79
> attr(,"center")
> [1] 11
> attr(,"scale")
> [1] 17
> attr(,"robust")
> [1] FALSE

Normalization rescales the variable so that the minimum and the maximum are 0 and 1. In R, it can be achieved via the normalize() function. However, note that it is a bit of a misnomer, as it doesn’t actually normalize the distribution (i.e., make “more Gaussian”) of the data (as shown below). Instead, it makes the range of the data within an expected range of [0, 1].

datawizard::normalize(c(4, 5, 2, 4, 42))
> [1] 0.050 0.075 0.000 0.050 1.000
> attr(,"include_bounds")
> [1] TRUE
> attr(,"min_value")
> [1] 2
> attr(,"range_difference")
> [1] 40

Normalization is, in fact, a special case of Rescaling. Any variable can be rescaled to a new range. In R, it can be achieved via the data_rescale() function.

datawizard::data_rescale(c(4, 5, 2, 4, 42), to = c(-1, 1))
> [1] -0.90 -0.85 -1.00 -0.90  1.00

Importantly, these transformations change only the scale (i.e., the “unit”); it does not impact the distribution of the variable. As such, it does not either impact the relationship that it has with other variables.

Let’s start by generating some random data and creating a convenience function to check the distribution of one of the variable and its relationship with another variable.

library(patchwork)

# Create dataset
data <- data.frame(x = bayestestR::distribution_normal(500, mean = 30, sd = 20))
data$y <- data$x^2 + rnorm(500, sd = 100)

# Write convenience function
check_transformation <- function(data, x = "x", color = "blue") {
ggplot(bayestestR::estimate_density(data[[x]]), aes(x = x, y = y)) +
geom_area(fill = color) +
ggtitle("Distribution of x") +
see::theme_modern() +
ggplot(data, aes_string(x = x, y = "y")) +
geom_point() +
geom_smooth(color = color, se = FALSE, method = "loess", formula = "y ~ x") +
ggtitle("Relationship with y") +
see::theme_modern()
}

Before Transformation:

check_transformation(data, color = "blue")

After Transformation:

data$x_standardized <- datawizard::standardize(data$x)

check_transformation(data, x = "x_standardized", color = "red") +
patchwork::plot_annotation("After Standardization")

data$x_normalized <- datawizard::normalize(data$x)

check_transformation(data, x = "x_normalized", color = "orange") +
patchwork::plot_annotation("After Normalization")

As you can see, although the unit of x has changed, scale-transformation have no impact on the distribution of the variable and on their relationship with other variables.

• Pros:
• Interpretation (standardized effect sizes)
• Computation (some models can converge more more efficiently)
• Cons:
• Interpretation (when the variable values have meaning)
• Reproducibility (values have different meaning depending on the sample’s characteristics - only for transformations that depend on properties of the variable like standardization)

## Shape Transformation

http://fmwww.bc.edu/repec/bocode/t/transint.html

“I see that this is a clever trick that works nicely. But how do I know when this trick will work with some other data, or if another trick is needed, or if no transformation is needed?”

“Transformations are needed because there is no guarantee that the world works on the scales it happens to be measured on.”

“Transformations are most appropriate when they match a scientific view of how a variable behaves.”

Most of the traditional transformation work best (i.e., are more easily explainable) with positively defined data. In the case where variables also have negative values, it is fairly common practice to shift the scale towards the positive range by adding the minimum. We can do that as follows: