Skip to contents

datawizard (development version)


  • recode_into(), similar to dplyr::case_when(), to recode values from one or more variables into a new variable.

  • mean_sd() and median_mad() for summarizing vectors to their mean (or median) and a range of one SD (or MAD) above and below.

  • data_write() as counterpart to data_read(), to write data frames into CSV, SPSS, SAS, Stata files and many other file types. One advantage over existing functions to write data in other packages is that labelled (numeric) data can be converted into factors (with values labels used as factor levels) even for text formats like CSV and similar. This allows exporting “labelled” data into those file formats, too.

  • add_labs(), to manually add value and variable labels as attributes to variables. These attributes are stored as "label" and "labels" attributes, similar to the labelled class from the haven package.


  • In selection patterns, expressions like -var1:var3 to exclude all variables between var1 and var3 are no longer accepted. The correct expression is -(var1:var3). This is for 2 reasons:

    • to be consistent with the behavior for numerics (-1:2 is not accepted but -(1:2) is);
    • to be consistent with dplyr::select(), which throws a warning and only uses the first variable in the first expression.


  • data_rename() gets a verbose argument.
  • winsorize() now errors if the threshold is incorrect (previously, it provided a warning and returned the unchanged data). The argument verbose is now useless but is kept for backward compatibility. The documentation now contains
    details about the valid values for threshold (#357).
  • In all functions that have arguments select and/or exclude, there is now one warning per misspelled variable. The previous behavior was to have only one warning.
  • Fixed inconsistent behaviour in standardize() when only one of the arguments center or scale were provided (#365).
  • unstandardize() and replace_nan_inf() now work with select helpers (#376).
  • Added informative warning and error messages to reverse(). Furthermore, the docs now describe the range argument more clearly (#380).
  • unnormalize() errors with unexpected inputs (#383).


  • empty_columns() (and therefore remove_empty_columns()) now correctly detects columns containing only NA_character_ (#349).
  • Select helpers now work in custom functions when argument is called select (#356).
  • Fix unexpected warning in convert_na_to() when select is a list (#352).
  • Fixed issue with correct labelling of numeric variables with more than nine unique values and associated value labels.

datawizard 0.6.5

CRAN release: 2022-12-14


  • Etienne Bacher is the new maintainer.



  • center(x) now works correctly when x is a single value and either reference or center is specified (#324).

  • Fixed issue in data_codebook(), which failed for labelled vectors when values of labels were not in sorted order.

datawizard 0.6.4

CRAN release: 2022-11-19


  • data_codebook(): to generate codebooks of data frames.

  • New functions to deal with duplicates: data_duplicated() (keep all duplicates, including the first occurrence) and data_unique() (returns the data, excluding all duplicates except one instance of each, based on the selected method).


  • .data.frame methods should now preserve custom attributes.

  • The include_bounds argument in normalize() can now also be a numeric value, defining the limit to the upper and lower bound (i.e. the distance to 1 and 0).

  • data_filter() now works with grouped data.


  • data_read() no longer prints message for empty columns when the data actually had no empty columns.

  • data_to_wide() now drops columns that are not in id_cols (if specified), names_from, or values_from. This is the behaviour observed in tidyr::pivot_wider().

datawizard 0.6.3

CRAN release: 2022-10-22



  • When column names are misspelled, most functions now suggest which existing columns possibly could be meant.

  • Miscellaneous performance gains.

  • convert_to_na() now requires argument na to be of class ‘Date’ to convert specific dates to NA. For example, convert_to_na(x, na = "2022-10-17") must be changed to convert_to_na(x, na = as.Date("2022-10-17")).


datawizard 0.6.2

CRAN release: 2022-10-04



  • data_read() gains a convert_factors argument, to turn off automatic conversion from numeric variables into factors.


datawizard 0.6.1

CRAN release: 2022-09-25

datawizard 0.6.0

CRAN release: 2022-09-15


  • The minimum needed R version has been bumped to 3.6.

  • Following deprecated functions have been removed:

data_cut(), data_recode(), data_shift(), data_reverse(), data_rescale(), data_to_factor(), data_to_numeric()

  • New text_format() alias is introduced for format_text(), latter of which will be removed in the next release.

  • New recode_values() alias is introduced for change_code(), latter of which will be removed in the next release.

  • data_merge() now errors if columns specified in by are not in both datasets.

  • Using negative values in arguments select and exclude now removes the columns from the selection/exclusion. The previous behavior was to start the selection/exclusion from the end of the dataset, which was inconsistent with the use of “-” with other selecting possibilities.


  • data_peek(): to peek at values and type of variables in a data frame.

  • coef_var(): to compute the coefficient of variation.


  • data_filter() will give more informative messages on malformed syntax of the filter argument.

  • It is now possible to use curly brackets to pass variable names to data_filter(), like the following example. See examples section in the documentation of data_filter().

  • The regex argument was added to functions that use select-helpers and did not already have this argument.

  • Select helpers starts_with(), ends_with(), and contains() now accept several patterns, e.g starts_with("Sep", "Petal").

  • Arguments select and exclude that are present in most functions have been improved to work in loops and in custom functions. For example, the following code now works:

foo <- function(data) {
  i <- "Sep"
  find_columns(data, select = starts_with(i))

for (i in c("Sepal", "Sp")) {
  head(iris) |>
    find_columns(select = starts_with(i)) |>
  • There is now a vignette summarizing the various ways to select or exclude variables in most datawizard functions.

datawizard 0.5.1

CRAN release: 2022-08-17

  • Fixes failing tests due to poorman update.

datawizard 0.5.0

CRAN release: 2022-08-07


  • Following statistical transformation functions have been renamed to not have data_*() prefix, since they do not work exclusively with data frames, but are typically first of all used with vectors, and therefore had misleading names:

Note that these functions also have .data.frame() methods and still work for data frames as well. Former function names are still available as aliases, but will be deprecated and removed in a future release.


  • Some of the text formatting helpers (like text_concatenate()) gain an enclose argument, to wrap text elements with surrounding characters.

  • winsorize now accepts “raw” and “zscore” methods (in addition to “percentile”). Additionally, when robust is set to TRUE together with method = "zscore", winsorizes via the median and median absolute deviation (MAD); else via the mean and standard deviation. (@rempsyc, #177, #49, #47).

  • convert_na_to now accepts numeric replacements on character vectors and single replacement for multiple vector classes. (@rempsyc, #214).

  • data_partition() now allows to create multiple partitions from the data, returning multiple training and a remaining test set.

  • Functions like center(), normalize() or standardize() no longer fail when data contains infinite values (Inf).



datawizard 0.4.1

CRAN release: 2022-05-16


  • Added the standardize.default() method (moved from package effectsize), to be consistent in that the default-method now is in the same package as the generic. standardize.default() behaves exactly like in effectsize and particularly works for regression model objects. effectsize now re-exports standardize() from datawizard.


  • data_shift() to shift the value range of numeric variables.

  • data_recode() to recode old into new values.

  • data_to_factor() as counterpart to data_to_numeric().

  • data_tabulate() to create frequency tables of variables.

  • data_read() to read (import) data files (from text, or foreign statistical packages).

  • unnormalize() as counterpart to normalize(). This function only works for variables that have been normalized with normalize().

  • data_group() and data_ungroup() to create grouped data frames, or to remove the grouping information from grouped data frames.


  • data_find() was added as alias to find_colums(), to have consistent name patterns for the datawizard functions. data_findcols() will be removed in a future update and usage is discouraged.

  • The select argument (and thus, also the exclude argument) now also accepts functions testing for logical conditions, e.g. is.numeric() (or is.numeric), or any user-defined function that selects the variables for which the function returns TRUE (like: foo <- function(x) mean(x) > 3).

  • Arguments select and exclude now allow the negation of select-helpers, like -ends_with(""), -is.numeric or -Sepal.Width:Petal.Length.

  • Many functions now get a .default method, to capture unsupported classes. This now yields a message and returns the original input, and hence, the .data.frame methods won’t stop due to an error.

  • The filter argument in data_filter() can also be a numeric vector, to indicate row indices of those rows that should be returned.

  • convert_to_na() gets methods for variables of class logical and Date.

  • convert_to_na() for factors (and data frames) gains a drop_levels argument, to drop unused levels that have been replaced by NA.

  • data_to_numeric() gains two more arguments, preserve_levels and lowest, to give better control of conversion of factors.


  • When logicals were passed to center() or standardize() and force = TRUE, these were not properly converted to numeric variables.

datawizard 0.4.0

CRAN release: 2022-03-30




i.e. always return a vector or a data frame. Thus, data_extract() can now be used to select multiple variables or pull a single variable from data frames.


  • data_to_numeric() produced wrong results for factors when dummy_factors = TRUE and factor contained missing values.

  • data_match() produced wrong results when data contained missing values.

  • Fixed CRAN check issues in data_extract() when more than one variable was extracted from a data frame.

datawizard 0.3.0

CRAN release: 2022-03-02


datawizard 0.2.3

CRAN release: 2022-01-26

datawizard 0.2.2

CRAN release: 2022-01-04

  • New function data_extract() (or its alias extract()) to pull single variables from a data frame, possibly naming each value by the row names of that data frame.

  • reshape_ci() gains a ci_type argument, to reshape data frames where CI-columns have prefixes other than "CI".

  • standardize() and center() gain arguments center and scale, to define references for centrality and deviation that are used when centering or standardizing variables.

  • center() gains the arguments force and reference, similar to standardize().

  • The functionality of the append argument in center() and standardize() was revised. This made the suffix argument redundant, and thus it was removed.

  • Fixed issue in standardize().

  • Fixed issue in data_findcols().

datawizard 0.2.1

CRAN release: 2021-10-04


CRAN release: 2021-09-02

  • This is mainly a maintenance release that addresses some issues with conflicting namespaces.

datawizard 0.2.0

CRAN release: 2021-08-17

  • New function: visualisation_recipe().

  • The following function has now moved to performance package: check_multimodal().

  • Minor updates to documentation, including a new vignette about demean().

datawizard 0.1.0

CRAN release: 2021-06-18

  • First release.