Skip to contents

This function is similar to the SPSS MEAN.n function and computes row means from a data frame or matrix if at least min_valid values of a row are valid (and not NA).


  select = NULL,
  exclude = NULL,
  min_valid = NULL,
  digits = NULL,
  ignore_case = FALSE,
  regex = FALSE,
  remove_na = FALSE,
  verbose = TRUE



A data frame with at least two columns, where row means are applied.


Variables that will be included when performing the required tasks. Can be either

  • a variable specified as a literal variable name (e.g., column_name),

  • a string with the variable name (e.g., "column_name"), or a character vector of variable names (e.g., c("col1", "col2", "col3")),

  • a formula with variable names (e.g., ~column_1 + column_2),

  • a vector of positive integers, giving the positions counting from the left (e.g. 1 or c(1, 3, 5)),

  • a vector of negative integers, giving the positions counting from the right (e.g., -1 or -1:-3),

  • one of the following select-helpers: starts_with(), ends_with(), contains(), a range using : or regex(""). starts_with(), ends_with(), and contains() accept several patterns, e.g starts_with("Sep", "Petal").

  • or a function testing for logical conditions, e.g. is.numeric() (or is.numeric), or any user-defined function that selects the variables for which the function returns TRUE (like: foo <- function(x) mean(x) > 3),

  • ranges specified via literal variable names, select-helpers (except regex()) and (user-defined) functions can be negated, i.e. return non-matching elements, when prefixed with a -, e.g. -ends_with(""), -is.numeric or -(Sepal.Width:Petal.Length). Note: Negation means that matches are excluded, and thus, the exclude argument can be used alternatively. For instance, select=-ends_with("Length") (with -) is equivalent to exclude=ends_with("Length") (no -). In case negation should not work as expected, use the exclude argument instead.

If NULL, selects all columns. Patterns that found no matches are silently ignored, e.g. find_columns(iris, select = c("Species", "Test")) will just return "Species".


See select, however, column names matched by the pattern from exclude will be excluded instead of selected. If NULL (the default), excludes no columns.


Optional, a numeric value of length 1. May either be

  • a numeric value that indicates the amount of valid values per row to calculate the row mean;

  • or a value between 0 and 1, indicating a proportion of valid values per row to calculate the row mean (see 'Details').

  • NULL (default), in which all cases are considered.

If a row's sum of valid values is less than min_valid, NA will be returned.


Numeric value indicating the number of decimal places to be used for rounding mean values. Negative values are allowed (see 'Details'). By default, digits = NULL and no rounding is used.


Logical, if TRUE and when one of the select-helpers or a regular expression is used in select, ignores lower/upper case in the search pattern when matching against variable names.


Logical, if TRUE, the search pattern from select will be treated as regular expression. When regex = TRUE, select must be a character string (or a variable containing a character string) and is not allowed to be one of the supported select-helpers or a character vector of length > 1. regex = TRUE is comparable to using one of the two select-helpers, select = contains("") or select = regex(""), however, since the select-helpers may not work when called from inside other functions (see 'Details'), this argument may be used as workaround.


Logical, if TRUE (default), removes missing (NA) values before calculating row means. Only applies if min_valuid is not specified.


Toggle warnings.


A vector with row means for those rows with at least n valid values.


Rounding to a negative number of digits means rounding to a power of ten, for example row_means(df, 3, digits = -2) rounds to the nearest hundred. For min_valid, if not NULL, min_valid must be a numeric value from 0 to ncol(data). If a row in the data frame has at least min_valid non-missing values, the row mean is returned. If min_valid is a non-integer value from 0 to 1, min_valid is considered to indicate the proportion of required non-missing values per row. E.g., if min_valid = 0.75, a row must have at least ncol(data) * min_valid non-missing values for the row mean to be calculated. See 'Examples'.


dat <- data.frame(
  c1 = c(1, 2, NA, 4),
  c2 = c(NA, 2, NA, 5),
  c3 = c(NA, 4, NA, NA),
  c4 = c(2, 3, 7, 8)

# default, all means are shown, if no NA values are present
#> [1]   NA 2.75   NA   NA

# remove all NA before computing row means
row_means(dat, remove_na = TRUE)
#> [1] 1.500000 2.750000 7.000000 5.666667

# needs at least 4 non-missing values per row
row_means(dat, min_valid = 4) # 1 valid return value
#> [1]   NA 2.75   NA   NA

# needs at least 3 non-missing values per row
row_means(dat, min_valid = 3) # 2 valid return values
#> [1]       NA 2.750000       NA 5.666667

# needs at least 2 non-missing values per row
row_means(dat, min_valid = 2)
#> [1] 1.500000 2.750000       NA 5.666667

# needs at least 1 non-missing value per row, for two selected variables
row_means(dat, select = c("c1", "c3"), min_valid = 1)
#> [1]  1  3 NA  4

# needs at least 50% of non-missing values per row
row_means(dat, min_valid = 0.5) # 3 valid return values
#> [1] 1.500000 2.750000       NA 5.666667

# needs at least 75% of non-missing values per row
row_means(dat, min_valid = 0.75) # 2 valid return values
#> [1]       NA 2.750000       NA 5.666667