Create a reference matrix, useful for visualisation, with evenly spread and
combined values. Usually used to make generate predictions using
get_predicted()
. See this
vignette
for a tutorial on how to create a visualisation matrix using this function.
Alternatively, these can also be used to extract the "grid" columns from objects generated by emmeans and marginaleffects (see those methods for more info).
Usage
get_datagrid(x, ...)
# S3 method for class 'data.frame'
get_datagrid(
x,
by = "all",
factors = "reference",
numerics = "mean",
preserve_range = FALSE,
reference = x,
length = 10,
range = "range",
...
)
# S3 method for class 'numeric'
get_datagrid(x, length = 10, range = "range", ...)
# S3 method for class 'factor'
get_datagrid(x, ...)
# Default S3 method
get_datagrid(
x,
by = "all",
factors = "reference",
numerics = "mean",
preserve_range = TRUE,
reference = x,
include_smooth = TRUE,
include_random = FALSE,
include_response = FALSE,
data = NULL,
verbose = TRUE,
...
)
Arguments
- x
An object from which to construct the reference grid.
- ...
Arguments passed to or from other methods (for instance,
length
orrange
to control the spread of numeric variables.).- by
Indicates the focal predictors (variables) for the reference grid and at which values focal predictors should be represented. If not specified otherwise, representative values for numeric variables or predictors are evenly distributed from the minimum to the maximum, with a total number of
length
values covering that range (see 'Examples'). Possible options forby
are:"all"
, which will include all variables or predictors.a character vector of one or more variable or predictor names, like
c("Species", "Sepal.Width")
, which will create a grid of all combinations of unique values. For factors, will use all levels, for numeric variables, will use a range of lengthlength
(evenly spread from minimum to maximum) and for character vectors, will use all unique values.a list of named elements, indicating focal predictors and their representative values, e.g.
by = list(Sepal.Length = c(2, 4), Species = "setosa")
.a string with assignments, e.g.
by = "Sepal.Length = 2"
orby = c("Sepal.Length = 2", "Species = 'setosa'")
- note the usage of single and double quotes to assign strings within strings.
There is a special handling of assignments with brackets, i.e. values defined inside
[
and]
.For numeric variables, the value(s) inside the brackets should either betwo values, indicating minimum and maximum (e.g.
by = "Sepal.Length = [0, 5]"
), for which a range of lengthlength
(evenly spread from given minimum to maximum) is created.more than two numeric values
by = "Sepal.Length = [2,3,4,5]"
, in which case these values are used as representative values.a "token" that creates pre-defined representative values:
for mean and -/+ 1 SD around the mean:
"x = [sd]"
for median and -/+ 1 MAD around the median:
"x = [mad]"
for Tukey's five number summary (minimum, lower-hinge, median, upper-hinge, maximum):
"x = [fivenum]"
for terciles, including minimum and maximum:
"x = [terciles]"
for terciles, excluding minimum and maximum:
"x = [terciles2]"
for quartiles, including minimum and maximum:
"x = [quartiles]"
for quartiles, excluding minimum and maximum:
"x = [quartiles2]"
for minimum and maximum value:
"x = [minmax]"
for 0 and the maximum value:
"x = [zeromax]"
For factor variables, the value(s) inside the brackets should indicate one or more factor levels, like
by = "Species = [setosa, versicolor]"
. Note: thelength
argument will be ignored when using brackets-tokens.The remaining variables not specified in
by
will be fixed (see also argumentsfactors
andnumerics
).- factors
Type of summary for factors. Can be
"reference"
(set at the reference level),"mode"
(set at the most common level) or"all"
to keep all levels.- numerics
Type of summary for numeric values. Can be
"all"
(will duplicate the grid for all unique values), any function ("mean"
,"median"
, ...) or a value (e.g.,numerics = 0
).- preserve_range
In the case of combinations between numeric variables and factors, setting
preserve_range = TRUE
will drop the observations where the value of the numeric variable is originally not present in the range of its factor level. This leads to an unbalanced grid. Also, if you want the minimum and the maximum to closely match the actual ranges, you should increase thelength
argument.- reference
The reference vector from which to compute the mean and SD. Used when standardizing or unstandardizing the grid using
effectsize::standardize
.- length
Length of numeric target variables selected in
by
. This arguments controls the number of (equally spread) values that will be taken to represent the continuous variables. A longer length will increase precision, but can also substantially increase the size of the datagrid (especially in case of interactions). IfNA
, will return all the unique values. In case of multiple continuous target variables,length
can also be a vector of different values (see examples).- range
Option to control the representative values given in
by
, if no specific values were provided. Use in combination with thelength
argument to control the number of values within the specified range.range
can be one of the following:"range"
(default), will use the minimum and maximum of the original data vector as end-points (min and max).if an interval type is specified, such as
"iqr"
,"ci"
,"hdi"
or"eti"
, it will spread the values within that range (the default CI width is95%
but this can be changed by adding for instanceci = 0.90
.) SeeIQR()
andbayestestR::ci()
. This can be useful to have more robust change and skipping extreme values.if
"sd"
or"mad"
, it will spread by this dispersion index around the mean or the median, respectively. If thelength
argument is an even number (e.g.,4
), it will have one more step on the positive side (i.e.,-1, 0, +1, +2
). The result is a named vector. See 'Examples.'"grid"
will create a reference grid that is useful when plotting predictions, by choosing representative values for numeric variables based on their position in the reference grid. If a numeric variable is the first predictor inby
, values from minimum to maximum of the same length as indicated inlength
are generated. For numeric predictors not specified at first inby
, mean and -1/+1 SD around the mean are returned. For factors, all levels are returned.
- include_smooth
If
x
is a model object, decide whether smooth terms should be included in the data grid or not.- include_random
If
x
is a mixed model object, decide whether random effect terms should be included in the data grid or not. Ifinclude_random
isFALSE
, butx
is a mixed model with random effects, these will still be included in the returned grid, but set to their "population level" value (e.g.,NA
for glmmTMB or0
for merMod). This ensures that commonpredict()
methods work properly, as these usually need data with all variables in the model included.- include_response
If
x
is a model object, decide whether the response variable should be included in the data grid or not.- data
Optional, the data frame that was used to fit the model. Usually, the data is retrieved via
get_data()
.- verbose
Toggle warnings.
See also
get_predicted()
to extract predictions, for which the data grid
is useful, and see the methods for objects generated
by emmeans and marginaleffects to extract the "grid" columns.
Examples
# Datagrids of variables and dataframes =====================================
# Single variable is of interest; all others are "fixed" ------------------
# Factors
get_datagrid(iris, by = "Species") # Returns all the levels
#> Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 1 setosa 5.843333 3.057333 3.758 1.199333
#> 2 versicolor 5.843333 3.057333 3.758 1.199333
#> 3 virginica 5.843333 3.057333 3.758 1.199333
get_datagrid(iris, by = "Species = c('setosa', 'versicolor')") # Specify an expression
#> Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 1 setosa 5.843333 3.057333 3.758 1.199333
#> 2 versicolor 5.843333 3.057333 3.758 1.199333
# Numeric variables
get_datagrid(iris, by = "Sepal.Length") # default spread length = 10
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 4.3 3.057333 3.758 1.199333 setosa
#> 2 4.7 3.057333 3.758 1.199333 setosa
#> 3 5.1 3.057333 3.758 1.199333 setosa
#> 4 5.5 3.057333 3.758 1.199333 setosa
#> 5 5.9 3.057333 3.758 1.199333 setosa
#> 6 6.3 3.057333 3.758 1.199333 setosa
#> 7 6.7 3.057333 3.758 1.199333 setosa
#> 8 7.1 3.057333 3.758 1.199333 setosa
#> 9 7.5 3.057333 3.758 1.199333 setosa
#> 10 7.9 3.057333 3.758 1.199333 setosa
get_datagrid(iris, by = "Sepal.Length", length = 3) # change length
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 4.3 3.057333 3.758 1.199333 setosa
#> 2 6.1 3.057333 3.758 1.199333 setosa
#> 3 7.9 3.057333 3.758 1.199333 setosa
get_datagrid(iris[2:150, ],
by = "Sepal.Length",
factors = "mode", numerics = "median"
) # change non-targets fixing
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 4.3 3 4.4 1.3 versicolor
#> 2 4.7 3 4.4 1.3 versicolor
#> 3 5.1 3 4.4 1.3 versicolor
#> 4 5.5 3 4.4 1.3 versicolor
#> 5 5.9 3 4.4 1.3 versicolor
#> 6 6.3 3 4.4 1.3 versicolor
#> 7 6.7 3 4.4 1.3 versicolor
#> 8 7.1 3 4.4 1.3 versicolor
#> 9 7.5 3 4.4 1.3 versicolor
#> 10 7.9 3 4.4 1.3 versicolor
get_datagrid(iris, by = "Sepal.Length", range = "ci", ci = 0.90) # change min/max of target
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 4.600 3.057333 3.758 1.199333 setosa
#> 2 4.895 3.057333 3.758 1.199333 setosa
#> 3 5.190 3.057333 3.758 1.199333 setosa
#> 4 5.485 3.057333 3.758 1.199333 setosa
#> 5 5.780 3.057333 3.758 1.199333 setosa
#> 6 6.075 3.057333 3.758 1.199333 setosa
#> 7 6.370 3.057333 3.758 1.199333 setosa
#> 8 6.665 3.057333 3.758 1.199333 setosa
#> 9 6.960 3.057333 3.758 1.199333 setosa
#> 10 7.255 3.057333 3.758 1.199333 setosa
get_datagrid(iris, by = "Sepal.Length = [0, 1]") # Manually change min/max
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 0.0000000 3.057333 3.758 1.199333 setosa
#> 2 0.1111111 3.057333 3.758 1.199333 setosa
#> 3 0.2222222 3.057333 3.758 1.199333 setosa
#> 4 0.3333333 3.057333 3.758 1.199333 setosa
#> 5 0.4444444 3.057333 3.758 1.199333 setosa
#> 6 0.5555556 3.057333 3.758 1.199333 setosa
#> 7 0.6666667 3.057333 3.758 1.199333 setosa
#> 8 0.7777778 3.057333 3.758 1.199333 setosa
#> 9 0.8888889 3.057333 3.758 1.199333 setosa
#> 10 1.0000000 3.057333 3.758 1.199333 setosa
get_datagrid(iris, by = "Sepal.Length = [sd]") # -1 SD, mean and +1 SD
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.015267 3.057333 3.758 1.199333 setosa
#> 2 5.843333 3.057333 3.758 1.199333 setosa
#> 3 6.671399 3.057333 3.758 1.199333 setosa
# identical to previous line: -1 SD, mean and +1 SD
get_datagrid(iris, by = "Sepal.Length", range = "sd", length = 3)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.015267 3.057333 3.758 1.199333 setosa
#> 2 5.843333 3.057333 3.758 1.199333 setosa
#> 3 6.671399 3.057333 3.758 1.199333 setosa
get_datagrid(iris, by = "Sepal.Length = [quartiles]") # quartiles
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 4.3 3.057333 3.758 1.199333 setosa
#> 2 5.1 3.057333 3.758 1.199333 setosa
#> 3 5.8 3.057333 3.758 1.199333 setosa
#> 4 6.4 3.057333 3.758 1.199333 setosa
#> 5 7.9 3.057333 3.758 1.199333 setosa
# Numeric and categorical variables, generating a grid for plots
# default spread length = 10
get_datagrid(iris, by = c("Sepal.Length", "Species"), range = "grid")
#> Sepal.Length Species Sepal.Width Petal.Length Petal.Width
#> 1 2.531069 setosa 3.057333 3.758 1.199333
#> 2 3.359135 setosa 3.057333 3.758 1.199333
#> 3 4.187201 setosa 3.057333 3.758 1.199333
#> 4 5.015267 setosa 3.057333 3.758 1.199333
#> 5 5.843333 setosa 3.057333 3.758 1.199333
#> 6 6.671399 setosa 3.057333 3.758 1.199333
#> 7 7.499466 setosa 3.057333 3.758 1.199333
#> 8 8.327532 setosa 3.057333 3.758 1.199333
#> 9 9.155598 setosa 3.057333 3.758 1.199333
#> 10 9.983664 setosa 3.057333 3.758 1.199333
#> 11 2.531069 versicolor 3.057333 3.758 1.199333
#> 12 3.359135 versicolor 3.057333 3.758 1.199333
#> 13 4.187201 versicolor 3.057333 3.758 1.199333
#> 14 5.015267 versicolor 3.057333 3.758 1.199333
#> 15 5.843333 versicolor 3.057333 3.758 1.199333
#> 16 6.671399 versicolor 3.057333 3.758 1.199333
#> 17 7.499466 versicolor 3.057333 3.758 1.199333
#> 18 8.327532 versicolor 3.057333 3.758 1.199333
#> 19 9.155598 versicolor 3.057333 3.758 1.199333
#> 20 9.983664 versicolor 3.057333 3.758 1.199333
#> 21 2.531069 virginica 3.057333 3.758 1.199333
#> 22 3.359135 virginica 3.057333 3.758 1.199333
#> 23 4.187201 virginica 3.057333 3.758 1.199333
#> 24 5.015267 virginica 3.057333 3.758 1.199333
#> 25 5.843333 virginica 3.057333 3.758 1.199333
#> 26 6.671399 virginica 3.057333 3.758 1.199333
#> 27 7.499466 virginica 3.057333 3.758 1.199333
#> 28 8.327532 virginica 3.057333 3.758 1.199333
#> 29 9.155598 virginica 3.057333 3.758 1.199333
#> 30 9.983664 virginica 3.057333 3.758 1.199333
# default spread length = 3 (-1 SD, mean and +1 SD)
get_datagrid(iris, by = c("Species", "Sepal.Length"), range = "grid")
#> Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 1 setosa 5.015267 3.057333 3.758 1.199333
#> 2 setosa 5.843333 3.057333 3.758 1.199333
#> 3 setosa 6.671399 3.057333 3.758 1.199333
#> 4 versicolor 5.015267 3.057333 3.758 1.199333
#> 5 versicolor 5.843333 3.057333 3.758 1.199333
#> 6 versicolor 6.671399 3.057333 3.758 1.199333
#> 7 virginica 5.015267 3.057333 3.758 1.199333
#> 8 virginica 5.843333 3.057333 3.758 1.199333
#> 9 virginica 6.671399 3.057333 3.758 1.199333
# Standardization and unstandardization
data <- get_datagrid(iris, by = "Sepal.Length", range = "sd", length = 3)
data$Sepal.Length # It is a named vector (extract names with `names(out$Sepal.Length)`)
#> -1 SD Mean +1 SD
#> 5.015267 5.843333 6.671399
datawizard::standardize(data, select = "Sepal.Length")
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 -1 3.057333 3.758 1.199333 setosa
#> 2 0 3.057333 3.758 1.199333 setosa
#> 3 1 3.057333 3.758 1.199333 setosa
data <- get_datagrid(iris, by = "Sepal.Length = c(-2, 0, 2)") # Manually specify values
data
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 -2 3.057333 3.758 1.199333 setosa
#> 2 0 3.057333 3.758 1.199333 setosa
#> 3 2 3.057333 3.758 1.199333 setosa
datawizard::unstandardize(data, select = "Sepal.Length")
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 4.187201 3.057333 3.758 1.199333 setosa
#> 2 5.843333 3.057333 3.758 1.199333 setosa
#> 3 7.499466 3.057333 3.758 1.199333 setosa
# Multiple variables are of interest, creating a combination --------------
get_datagrid(iris, by = c("Sepal.Length", "Species"), length = 3)
#> Sepal.Length Species Sepal.Width Petal.Length Petal.Width
#> 1 4.3 setosa 3.057333 3.758 1.199333
#> 2 6.1 setosa 3.057333 3.758 1.199333
#> 3 7.9 setosa 3.057333 3.758 1.199333
#> 4 4.3 versicolor 3.057333 3.758 1.199333
#> 5 6.1 versicolor 3.057333 3.758 1.199333
#> 6 7.9 versicolor 3.057333 3.758 1.199333
#> 7 4.3 virginica 3.057333 3.758 1.199333
#> 8 6.1 virginica 3.057333 3.758 1.199333
#> 9 7.9 virginica 3.057333 3.758 1.199333
get_datagrid(iris, by = c("Sepal.Length", "Petal.Length"), length = c(3, 2))
#> Sepal.Length Petal.Length Sepal.Width Petal.Width Species
#> 1 4.3 1.0 3.057333 1.199333 setosa
#> 2 6.1 1.0 3.057333 1.199333 setosa
#> 3 7.9 1.0 3.057333 1.199333 setosa
#> 4 4.3 6.9 3.057333 1.199333 setosa
#> 5 6.1 6.9 3.057333 1.199333 setosa
#> 6 7.9 6.9 3.057333 1.199333 setosa
get_datagrid(iris, by = c(1, 3), length = 3)
#> Sepal.Length Petal.Length Sepal.Width Petal.Width Species
#> 1 4.3 1.00 3.057333 1.199333 setosa
#> 2 6.1 1.00 3.057333 1.199333 setosa
#> 3 7.9 1.00 3.057333 1.199333 setosa
#> 4 4.3 3.95 3.057333 1.199333 setosa
#> 5 6.1 3.95 3.057333 1.199333 setosa
#> 6 7.9 3.95 3.057333 1.199333 setosa
#> 7 4.3 6.90 3.057333 1.199333 setosa
#> 8 6.1 6.90 3.057333 1.199333 setosa
#> 9 7.9 6.90 3.057333 1.199333 setosa
get_datagrid(iris, by = c("Sepal.Length", "Species"), preserve_range = TRUE)
#> Sepal.Length Species Sepal.Width Petal.Length Petal.Width
#> 1 4.3 setosa 3.057333 3.758 1.199333
#> 2 4.7 setosa 3.057333 3.758 1.199333
#> 3 5.1 setosa 3.057333 3.758 1.199333
#> 4 5.5 setosa 3.057333 3.758 1.199333
#> 5 5.1 versicolor 3.057333 3.758 1.199333
#> 6 5.5 versicolor 3.057333 3.758 1.199333
#> 7 5.9 versicolor 3.057333 3.758 1.199333
#> 8 6.3 versicolor 3.057333 3.758 1.199333
#> 9 6.7 versicolor 3.057333 3.758 1.199333
#> 10 5.1 virginica 3.057333 3.758 1.199333
#> 11 5.5 virginica 3.057333 3.758 1.199333
#> 12 5.9 virginica 3.057333 3.758 1.199333
#> 13 6.3 virginica 3.057333 3.758 1.199333
#> 14 6.7 virginica 3.057333 3.758 1.199333
#> 15 7.1 virginica 3.057333 3.758 1.199333
#> 16 7.5 virginica 3.057333 3.758 1.199333
#> 17 7.9 virginica 3.057333 3.758 1.199333
get_datagrid(iris, by = c("Sepal.Length", "Species"), numerics = 0)
#> Sepal.Length Species Sepal.Width Petal.Length Petal.Width
#> 1 4.3 setosa 0 0 0
#> 2 4.7 setosa 0 0 0
#> 3 5.1 setosa 0 0 0
#> 4 5.5 setosa 0 0 0
#> 5 5.9 setosa 0 0 0
#> 6 6.3 setosa 0 0 0
#> 7 6.7 setosa 0 0 0
#> 8 7.1 setosa 0 0 0
#> 9 7.5 setosa 0 0 0
#> 10 7.9 setosa 0 0 0
#> 11 4.3 versicolor 0 0 0
#> 12 4.7 versicolor 0 0 0
#> 13 5.1 versicolor 0 0 0
#> 14 5.5 versicolor 0 0 0
#> 15 5.9 versicolor 0 0 0
#> 16 6.3 versicolor 0 0 0
#> 17 6.7 versicolor 0 0 0
#> 18 7.1 versicolor 0 0 0
#> 19 7.5 versicolor 0 0 0
#> 20 7.9 versicolor 0 0 0
#> 21 4.3 virginica 0 0 0
#> 22 4.7 virginica 0 0 0
#> 23 5.1 virginica 0 0 0
#> 24 5.5 virginica 0 0 0
#> 25 5.9 virginica 0 0 0
#> 26 6.3 virginica 0 0 0
#> 27 6.7 virginica 0 0 0
#> 28 7.1 virginica 0 0 0
#> 29 7.5 virginica 0 0 0
#> 30 7.9 virginica 0 0 0
get_datagrid(iris, by = c("Sepal.Length = 3", "Species"))
#> Sepal.Length Species Sepal.Width Petal.Length Petal.Width
#> 1 3 setosa 3.057333 3.758 1.199333
#> 2 3 versicolor 3.057333 3.758 1.199333
#> 3 3 virginica 3.057333 3.758 1.199333
get_datagrid(iris, by = c("Sepal.Length = c(3, 1)", "Species = 'setosa'"))
#> Sepal.Length Species Sepal.Width Petal.Length Petal.Width
#> 1 3 setosa 3.057333 3.758 1.199333
#> 2 1 setosa 3.057333 3.758 1.199333
# With list-style by-argument
get_datagrid(iris, by = list(Sepal.Length = c(1, 3), Species = "setosa"))
#> Sepal.Length Species Sepal.Width Petal.Length Petal.Width
#> 1 1 setosa 3.057333 3.758 1.199333
#> 2 3 setosa 3.057333 3.758 1.199333
# With models ===============================================================
# Fit a linear regression
model <- lm(Sepal.Length ~ Sepal.Width * Petal.Length, data = iris)
# Get datagrid of predictors
data <- get_datagrid(model, length = c(20, 3), range = c("range", "sd"))
# same as: get_datagrid(model, range = "grid", length = 20)
# Add predictions
data$Sepal.Length <- get_predicted(model, data = data)
# Visualize relationships (each color is at -1 SD, Mean, and + 1 SD of Petal.Length)
plot(data$Sepal.Width, data$Sepal.Length,
col = data$Petal.Length,
main = "Relationship at -1 SD, Mean, and + 1 SD of Petal.Length"
)