R/standardize.R, R/standardize.data.R, R/standardize.models.R
standardize.RdPerforms a standardization of data (z-scoring), i.e., centering and scaling,
so that the data is expressed in terms of standard deviation (i.e., mean = 0,
SD = 1) or Median Absolute Deviance (median = 0, MAD = 1). When applied to a
statistical model, this function extracts the dataset, standardizes it, and
refits the model with this standardized version of the dataset. The
normalize() function can also be used to scale all numeric variables within
the 0 - 1 range.
standardize( x, robust = FALSE, two_sd = FALSE, weights = NULL, verbose = TRUE, ... ) # S3 method for numeric standardize( x, robust = FALSE, two_sd = FALSE, weights = NULL, verbose = TRUE, ... ) # S3 method for data.frame standardize( x, robust = FALSE, two_sd = FALSE, weights = NULL, verbose = TRUE, select = NULL, exclude = NULL, remove_na = c("none", "selected", "all"), force = FALSE, append = FALSE, suffix = "_z", ... ) # S3 method for default standardize( x, robust = FALSE, two_sd = FALSE, weights = TRUE, verbose = TRUE, include_response = TRUE, ... )
| x | A data frame, a vector or a statistical model. |
|---|---|
| robust | Logical, if |
| two_sd | If |
| weights | Can be
|
| verbose | Toggle warnings on or off. |
| ... | Arguments passed to or from other methods. |
| select | Character vector of column names. If |
| exclude | Character vector of column names to be excluded from selection. |
| remove_na | How should missing values ( |
| force | Logical, if |
| append | Logical, if |
| suffix | Character value, will be appended to variable (column) names of
|
| include_response | For a model, if |
The standardized object (either a standardize data frame or a statistical model fitted on standardized data).
When x is a vector or a data frame with remove_na = "none"),
missing values are preserved, so the return value has the same length /
number of rows as the original input.
If x is a model object, standardization is done by completely refitting the
model on the standardized data. Hence, this approach is equal to
standardizing the variables before fitting the model and will return a new
model object. However, this method is particularly recommended for complex
models that include interactions or transformations (e.g., polynomial or
spline terms). The robust (default to FALSE) argument enables a robust
standardization of data, i.e., based on the median and MAD instead of the
mean and SD. See standardize_parameters() for other methods of
standardizing model coefficients.
When the model's formula contains transformations (e.g. y ~ exp(X)) the
transformation effectively takes place after standardization (e.g.,
exp(scale(X))). Some transformations are undefined for negative values,
such as log() and sqrt(). To avoid dropping these values, the
standardized data is shifted by Z - min(Z) + 1 or Z - min(Z)
(respectively).
#> Sepal.Length Sepal.Width Petal.Length Petal.Width #> Min. :-1.86378 Min. :-2.4258 Min. :-1.5623 Min. :-1.4422 #> 1st Qu.:-0.89767 1st Qu.:-0.5904 1st Qu.:-1.2225 1st Qu.:-1.1799 #> Median :-0.05233 Median :-0.1315 Median : 0.3354 Median : 0.1321 #> Mean : 0.00000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 #> 3rd Qu.: 0.67225 3rd Qu.: 0.5567 3rd Qu.: 0.7602 3rd Qu.: 0.7880 #> Max. : 2.48370 Max. : 3.0805 Max. : 1.7799 Max. : 1.7064 #> Species #> setosa :50 #> versicolor:50 #> virginica :50 #> #> #>#> (Intercept) Speciesversicolor #> 0.05969491 -0.16597410 #> Speciesvirginica Petal.Width #> 0.18985849 0.85622714 #> Speciesversicolor:Petal.Width Speciesvirginica:Petal.Width #> 0.45674637 -0.25713539