Computes summary table of means by groups.
Usage
means_by_group(x, ...)
# S3 method for class 'numeric'
means_by_group(
x,
by = NULL,
ci = 0.95,
weights = NULL,
digits = NULL,
group = NULL,
...
)
# S3 method for class 'data.frame'
means_by_group(
x,
select = NULL,
by = NULL,
ci = 0.95,
weights = NULL,
digits = NULL,
exclude = NULL,
ignore_case = FALSE,
regex = FALSE,
verbose = TRUE,
group = NULL,
...
)
Arguments
- x
A vector or a data frame.
- ...
Currently not used
- by
If
x
is a numeric vector,by
should be a factor that indicates the group-classifying categories. Ifx
is a data frame,by
should be a character string, naming the variable inx
that is used for grouping. Numeric vectors are coerced to factors. Not thatby
should only refer to a single variable.- ci
Level of confidence interval for mean estimates. Default is
0.95
. Useci = NA
to suppress confidence intervals.- weights
If
x
is a numeric vector,weights
should be a vector of weights that will be applied to weight all observations. Ifx
is a data frame,weights
can also be a character string indicating the name of the variable inx
that should be used for weighting. Default isNULL
, so no weights are used.- digits
Optional scalar, indicating the amount of digits after decimal point when rounding estimates and values.
- group
Deprecated. Use
by
instead.- select
Variables that will be included when performing the required tasks. Can be either
a variable specified as a literal variable name (e.g.,
column_name
),a string with the variable name (e.g.,
"column_name"
), or a character vector of variable names (e.g.,c("col1", "col2", "col3")
),a formula with variable names (e.g.,
~column_1 + column_2
),a vector of positive integers, giving the positions counting from the left (e.g.
1
orc(1, 3, 5)
),a vector of negative integers, giving the positions counting from the right (e.g.,
-1
or-1:-3
),one of the following select-helpers:
starts_with()
,ends_with()
,contains()
, a range using:
orregex("")
.starts_with()
,ends_with()
, andcontains()
accept several patterns, e.gstarts_with("Sep", "Petal")
.or a function testing for logical conditions, e.g.
is.numeric()
(oris.numeric
), or any user-defined function that selects the variables for which the function returnsTRUE
(like:foo <- function(x) mean(x) > 3
),ranges specified via literal variable names, select-helpers (except
regex()
) and (user-defined) functions can be negated, i.e. return non-matching elements, when prefixed with a-
, e.g.-ends_with("")
,-is.numeric
or-(Sepal.Width:Petal.Length)
. Note: Negation means that matches are excluded, and thus, theexclude
argument can be used alternatively. For instance,select=-ends_with("Length")
(with-
) is equivalent toexclude=ends_with("Length")
(no-
). In case negation should not work as expected, use theexclude
argument instead.
If
NULL
, selects all columns. Patterns that found no matches are silently ignored, e.g.extract_column_names(iris, select = c("Species", "Test"))
will just return"Species"
.- exclude
See
select
, however, column names matched by the pattern fromexclude
will be excluded instead of selected. IfNULL
(the default), excludes no columns.- ignore_case
Logical, if
TRUE
and when one of the select-helpers or a regular expression is used inselect
, ignores lower/upper case in the search pattern when matching against variable names.- regex
Logical, if
TRUE
, the search pattern fromselect
will be treated as regular expression. Whenregex = TRUE
, select must be a character string (or a variable containing a character string) and is not allowed to be one of the supported select-helpers or a character vector of length > 1.regex = TRUE
is comparable to using one of the two select-helpers,select = contains("")
orselect = regex("")
, however, since the select-helpers may not work when called from inside other functions (see 'Details'), this argument may be used as workaround.- verbose
Toggle warnings.
Details
This function is comparable to aggregate(x, by, mean)
, but provides
some further information, including summary statistics from a One-Way-ANOVA
using x
as dependent and by
as independent variable. emmeans::contrast()
is used to get p-values for each sub-group. P-values indicate whether each
group-mean is significantly different from the total mean.
Examples
data(efc)
means_by_group(efc, "c12hour", "e42dep")
#> # Mean of average number of hours of care per week by elder's dependency
#>
#> Category | Mean | N | SD | 95% CI | p
#> ----------------------------------------------------------------------
#> independent | 17.00 | 2 | 11.31 | [-68.46, 102.46] | 0.573
#> slightly dependent | 34.25 | 4 | 29.97 | [-26.18, 94.68] | 0.626
#> moderately dependent | 52.75 | 28 | 51.83 | [ 29.91, 75.59] | > .999
#> severely dependent | 106.97 | 63 | 65.88 | [ 91.74, 122.19] | 0.001
#> Total | 86.46 | 97 | 66.40 | |
#>
#> Anova: R2=0.186; adj.R2=0.160; F=7.098; p<.001
data(iris)
means_by_group(iris, "Sepal.Width", "Species")
#> # Mean of Sepal.Width by Species
#>
#> Category | Mean | N | SD | 95% CI | p
#> ------------------------------------------------------
#> setosa | 3.43 | 50 | 0.38 | [3.33, 3.52] | < .001
#> versicolor | 2.77 | 50 | 0.31 | [2.68, 2.86] | < .001
#> virginica | 2.97 | 50 | 0.32 | [2.88, 3.07] | 0.035
#> Total | 3.06 | 150 | 0.44 | |
#>
#> Anova: R2=0.401; adj.R2=0.393; F=49.160; p<.001
# weighting
efc$weight <- abs(rnorm(n = nrow(efc), mean = 1, sd = .5))
means_by_group(efc, "c12hour", "e42dep", weights = "weight")
#> # Mean of average number of hours of care per week by elder's dependency
#>
#> Category | Mean | N | SD | 95% CI | p
#> ---------------------------------------------------------------------
#> independent | 15.39 | 2 | 11.31 | [-68.18, 98.95] | 0.430
#> slightly dependent | 19.17 | 3 | 14.04 | [-46.71, 85.05] | 0.430
#> moderately dependent | 54.34 | 32 | 52.64 | [ 33.46, 75.22] | 0.688
#> severely dependent | 103.18 | 57 | 66.41 | [ 87.44, 118.91] | 0.001
#> Total | 81.71 | 97 | 65.84 | |
#>
#> Anova: R2=0.178; adj.R2=0.152; F=6.717; p<.001