R/standardize.R
, R/standardize.data.R
, R/standardize.models.R
, and 1 more
standardize.Rd
Performs a standardization of data (zscoring), i.e., centering and scaling,
so that the data is expressed in terms of standard deviation (i.e., mean = 0,
SD = 1) or Median Absolute Deviance (median = 0, MAD = 1). When applied to a
statistical model, this function extracts the dataset, standardizes it, and
refits the model with this standardized version of the dataset. The
normalize()
function can also be used to scale all numeric variables within
the 0  1 range.
standardize(
x,
robust = FALSE,
two_sd = FALSE,
weights = NULL,
verbose = TRUE,
...
)
# S3 method for numeric
standardize(
x,
robust = FALSE,
two_sd = FALSE,
weights = NULL,
verbose = TRUE,
reference = NULL,
...
)
# S3 method for data.frame
standardize(
x,
robust = FALSE,
two_sd = FALSE,
weights = NULL,
verbose = TRUE,
reference = NULL,
select = NULL,
exclude = NULL,
remove_na = c("none", "selected", "all"),
force = FALSE,
append = FALSE,
suffix = "_z",
...
)
# S3 method for default
standardize(
x,
robust = FALSE,
two_sd = FALSE,
weights = TRUE,
verbose = TRUE,
include_response = TRUE,
...
)
unstandardize(
x,
center = NULL,
scale = NULL,
reference = NULL,
robust = FALSE,
two_sd = FALSE,
...
)
x  A data frame, a vector or a statistical model (for 

robust  Logical, if 
two_sd  If 
weights  Can be

verbose  Toggle warnings and messages on or off. 
...  Arguments passed to or from other methods. 
reference  A dataframe or variable from which the centrality and deviation will be computed instead of from the input variable. Useful for standardizing a subset or new data according to another dataframe. 
select  Character vector of column names. If 
exclude  Character vector of column names to be excluded from selection. 
remove_na  How should missing values ( 
force  Logical, if 
append  Logical, if 
suffix  Character value, will be appended to variable (column) names of

include_response  For a model, if 
center, scale  Used by 
The standardized object (either a standardize data frame or a statistical model fitted on standardized data).
When x
is a vector or a data frame with remove_na = "none")
,
missing values are preserved, so the return value has the same length /
number of rows as the original input.
If x
is a model object, standardization is done by completely refitting the
model on the standardized data. Hence, this approach is equal to
standardizing the variables before fitting the model and will return a new
model object. However, this method is particularly recommended for complex
models that include interactions or transformations (e.g., polynomial or
spline terms). The robust
(default to FALSE
) argument enables a robust
standardization of data, i.e., based on the median
and MAD
instead of the
mean
and SD
. See standardize_parameters()
for other methods of
standardizing model coefficients.
When the model's formula contains transformations (e.g. y ~ exp(X)
) the
transformation effectively takes place after standardization (e.g.,
exp(scale(X))
). Some transformations are undefined for negative values,
such as log()
and sqrt()
. To avoid dropping these values, the
standardized data is shifted by Z  min(Z) + 1
or Z  min(Z)
(respectively).
When standardizing coefficients of a generalized model (GLM, GLMM, etc), only the predictors are standardized, maintaining the interpretability of the coefficients (e.g., in a binomial model: the exponent of the standardized parameter is the OR of a change of 1 SD in the predictor, etc.)
Other transform utilities:
change_scale()
,
normalize()
,
ranktransform()
Other standardize:
standardize_info()
,
standardize_parameters()
# Data frames
summary(standardize(swiss))
#> Fertility Agriculture Examination Education
#> Min. :2.81327 Min. :2.1778 Min. :1.69084 Min. :1.0378
#> 1st Qu.:0.43569 1st Qu.:0.6499 1st Qu.:0.56273 1st Qu.:0.5178
#> Median : 0.02061 Median : 0.1515 Median :0.06134 Median :0.3098
#> Mean : 0.00000 Mean : 0.0000 Mean : 0.00000 Mean : 0.0000
#> 3rd Qu.: 0.66504 3rd Qu.: 0.7481 3rd Qu.: 0.69074 3rd Qu.: 0.1062
#> Max. : 1.78978 Max. : 1.7190 Max. : 2.57094 Max. : 4.3702
#> Catholic Infant.Mortality
#> Min. :0.9350 Min. :3.13886
#> 1st Qu.:0.8620 1st Qu.:0.61543
#> Median :0.6235 Median : 0.01972
#> Mean : 0.0000 Mean : 0.00000
#> 3rd Qu.: 1.2464 3rd Qu.: 0.60337
#> Max. : 1.4113 Max. : 2.28566
# Models
model < lm(Infant.Mortality ~ Education * Fertility, data = swiss)
coef(standardize(model))
#> (Intercept) Education Fertility Education:Fertility
#> 0.06386069 0.47482848 0.63270919 0.09829777