Skip to contents

data_extract() (or its alias extract()) is similar to $. It extracts either a single column or element from an object (e.g., a data frame, list), or multiple columns resp. elements.

Usage

data_extract(data, select, ...)

# S3 method for data.frame
data_extract(
  data,
  select,
  name = NULL,
  extract = "all",
  as_data_frame = FALSE,
  ignore_case = FALSE,
  verbose = TRUE,
  ...
)

Arguments

data

The object to subset. Methods are currently available for data frames and data frame extensions (e.g., tibbles).

select

Variables that will be included when performing the required tasks. Can be either

  • a variable specified as a literal variable name (e.g., column_name),

  • a string with the variable name (e.g., "column_name"), or a character vector of variable names (e.g., c("col1", "col2", "col3")),

  • a formula with variable names (e.g., ~column_1 + column_2),

  • a vector of positive integers, giving the positions counting from the left (e.g. 1 or c(1, 3, 5)),

  • a vector of negative integers, giving the positions counting from the right (e.g., -1 or -1:-3),

  • one of the following select-helpers: starts_with(""), ends_with(""), contains(""), a range using : or regex(""),

  • or a function testing for logical conditions, e.g. is.numeric() (or is.numeric), or any user-defined function that selects the variables for which the function returns TRUE (like: foo <- function(x) mean(x) > 3),

  • ranges specified via literal variable names, select-helpers (except regex()) and (user-defined) functions can be negated, i.e. return non-matching elements, when prefixed with a -, e.g. -ends_with(""), -is.numeric or -Sepal.Width:Petal.Length. Note: Negation means that matches are excluded, and thus, the exclude argument can be used alternatively. For instance, select=-ends_with("Length") (with -) is equivalent to exclude=ends_with("Length") (no -). In case negation should not work as expected, use the exclude argument instead.

If NULL, selects all columns. Patterns that found no matches are silently ignored, e.g. find_columns(iris, select = c("Species", "Test")) will just return "Species".

...

For use by future methods.

name

An optional argument that specifies the column to be used as names for the vector elements after extraction. Must be specified either as literal variable name (e.g., column_name) or as string ("column_name"). name will be ignored when a data frame is returned.

extract

String, indicating which element will be extracted when select matches multiple variables. Can be "all" (the default) to return all matched variables, "first" or "last" to return the first or last match, or "odd" and "even" to return all odd-numbered or even-numbered matches. Note that "first" or "last" return a vector (unless as_data_frame = TRUE), while "all" can return a vector (if only one match was found) or a data frame (for more than one match). Type safe return values are only possible when extract is "first" or "last" (will always return a vector) or when as_data_frame = TRUE (always returns a data frame).

as_data_frame

Logical, if TRUE, will always return a data frame, even if only one variable was matched. If FALSE, either returns a vector or a data frame. See extract for details.

ignore_case

Logical, if TRUE and when one of the select-helpers or a regular expression is used in select, ignores lower/upper case in the search pattern when matching against variable names.

verbose

Toggle warnings.

Value

A vector (or a data frame) containing the extracted element, or NULL if no matching variable was found.

Details

data_extract() can be used to select multiple variables or pull a single variable from a data frame. Thus, the return value is by default not type safe - data_extract() either returns a vector or a data frame.

Extracting single variables (vectors)

When select is the name of a single column, or when select only matches one column, a vector is returned. A single variable is also returned when extract is either "first or "last". Setting as_data_frame to TRUE overrides this behaviour and always returns a data frame.

Extracting a data frame of variables

When select is a character vector containing more than one column name (or a numeric vector with more than one valid column indices), or when select uses one of the supported select-helpers that match multiple columns, a data frame is returned. Setting as_data_frame to TRUE always returns a data frame.

Examples

# single variable
data_extract(mtcars, cyl, name = gear)
#> 4 4 4 3 3 3 3 4 4 4 4 3 3 3 3 3 3 4 4 4 3 3 3 3 3 4 5 5 5 5 5 4 
#> 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4 
data_extract(mtcars, "cyl", name = gear)
#> 4 4 4 3 3 3 3 4 4 4 4 3 3 3 3 3 3 4 4 4 3 3 3 3 3 4 5 5 5 5 5 4 
#> 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4 
data_extract(mtcars, -1, name = gear)
#> 4 4 4 3 3 3 3 4 4 4 4 3 3 3 3 3 3 4 4 4 3 3 3 3 3 4 5 5 5 5 5 4 
#> 4 4 1 1 2 1 4 2 2 4 4 3 3 3 4 4 4 1 2 1 1 2 2 4 2 1 2 2 4 6 8 2 
data_extract(mtcars, cyl, name = 0)
#>           Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
#>                   6                   6                   4                   6 
#>   Hornet Sportabout             Valiant          Duster 360           Merc 240D 
#>                   8                   6                   8                   4 
#>            Merc 230            Merc 280           Merc 280C          Merc 450SE 
#>                   4                   6                   6                   8 
#>          Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
#>                   8                   8                   8                   8 
#>   Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
#>                   8                   4                   4                   4 
#>       Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
#>                   4                   8                   8                   8 
#>    Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
#>                   8                   4                   4                   4 
#>      Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
#>                   8                   6                   8                   4 
data_extract(mtcars, cyl, name = "row.names")
#>           Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
#>                   6                   6                   4                   6 
#>   Hornet Sportabout             Valiant          Duster 360           Merc 240D 
#>                   8                   6                   8                   4 
#>            Merc 230            Merc 280           Merc 280C          Merc 450SE 
#>                   4                   6                   6                   8 
#>          Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
#>                   8                   8                   8                   8 
#>   Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
#>                   8                   4                   4                   4 
#>       Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
#>                   4                   8                   8                   8 
#>    Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
#>                   8                   4                   4                   4 
#>      Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
#>                   8                   6                   8                   4 

# selecting multiple variables
head(data_extract(iris, starts_with("Sepal")))
#>   Sepal.Length Sepal.Width
#> 1          5.1         3.5
#> 2          4.9         3.0
#> 3          4.7         3.2
#> 4          4.6         3.1
#> 5          5.0         3.6
#> 6          5.4         3.9
head(data_extract(iris, ends_with("Width")))
#>   Sepal.Width Petal.Width
#> 1         3.5         0.2
#> 2         3.0         0.2
#> 3         3.2         0.2
#> 4         3.1         0.2
#> 5         3.6         0.2
#> 6         3.9         0.4
head(data_extract(iris, 2:4))
#>   Sepal.Width Petal.Length Petal.Width
#> 1         3.5          1.4         0.2
#> 2         3.0          1.4         0.2
#> 3         3.2          1.3         0.2
#> 4         3.1          1.5         0.2
#> 5         3.6          1.4         0.2
#> 6         3.9          1.7         0.4

# select first of multiple variables
data_extract(iris, starts_with("Sepal"), extract = "first")
#>   [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1
#>  [19] 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0
#>  [37] 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4 6.9 5.5
#>  [55] 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1
#>  [73] 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5
#>  [91] 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3
#> [109] 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2
#> [127] 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8
#> [145] 6.7 6.7 6.3 6.5 6.2 5.9

# select first of multiple variables, return as data frame
head(data_extract(iris, starts_with("Sepal"), extract = "first", as_data_frame = TRUE))
#>   Sepal.Length
#> 1          5.1
#> 2          4.9
#> 3          4.7
#> 4          4.6
#> 5          5.0
#> 6          5.4