Skip to contents

Data Preparation

Main functions for cleaning and preparing data

data_to_long() reshape_longer()
Reshape (pivot) data from wide to long
data_to_wide() reshape_wider()
Reshape (pivot) data from long to wide
data_extract()
Extract one or more columns or elements from an object
data_match() data_filter()
Return filtered or sliced data frame, or row indices
data_select() extract_column_names() find_columns()
Find or get columns in a data frame based on search patterns
data_relocate() data_reorder() data_remove()
Relocate (reorder) columns of a data frame
data_arrange()
Arrange rows by column values
data_merge() data_join()
Merge (join) two data frames, or a list of data frames
data_partition()
Partition data
data_rotate() data_transpose()
Rotate a data frame
data_group() data_ungroup()
Create a grouped data frame
data_replicate()
Expand (i.e. replicate rows) a data frame
data_duplicated()
Extract all duplicates
data_unique()
Keep only one row from all with duplicated IDs

Data and Variable Transformations

Statistical Transformations

Functions for transforming variables

data_modify()
Create new variables in a data frame
data_separate()
Separate single variable into multiple variables
data_unite()
Unite ("merge") multiple variables
categorize()
Recode (or "cut" / "bin") data into groups of values.
recode_into()
Recode values from one or more variables into a new variable
recode_values()
Recode old values of variables into new values
adjust() data_adjust()
Adjust data for the effect of other variable(s)
demean() degroup() detrend()
Compute group-meaned and de-meaned variables
ranktransform()
(Signed) rank transformation
rescale_weights()
Rescale design weights for multilevel analysis
winsorize()
Winsorize data

Linear Transformers

Convenient functions for common linear transformations

center() centre()
Centering (Grand-Mean Centering)
slide()
Shift numeric value range
standardize() standardise() unstandardize() unstandardise()
Standardization (Z-scoring)
standardize(<default>)
Re-fit a model with standardized data
reverse() reverse_scale()
Reverse-Score Variables
rescale() change_scale()
Rescale Variables to a New Range
normalize() unnormalize()
Normalize numeric variable to 0-1 range
makepredictcall(<dw_transformer>)
Utility Function for Safe Prediction with datawizard transformers

Others

contr.deviation()
Deviation Contrast Matrix

Data Properties

Functions to compute statistical summaries of data properties and distributions

data_codebook() print_html(<data_codebook>)
Generate a codebook of a data frame.
data_summary()
Summarize data
data_tabulate() as.data.frame(<datawizard_tables>)
Create frequency and crosstables of variables
data_peek()
Peek at values and type of variables in a data frame
data_seek()
Find variables by their names, variable or value labels
means_by_group()
Summary of mean values by group
coef_var() distribution_coef_var()
Compute the coefficient of variation
describe_distribution()
Describe a distribution
distribution_mode()
Compute mode for a statistical distribution
skewness() kurtosis() print(<parameters_kurtosis>) print(<parameters_skewness>) summary(<parameters_skewness>) summary(<parameters_kurtosis>)
Compute Skewness and (Excess) Kurtosis
smoothness()
Quantify the smoothness of a vector
row_count()
Count specific values row-wise
row_means() row_sums()
Row means or sums (optionally with minimum amount of valid values)
weighted_mean() weighted_median() weighted_sd() weighted_mad()
Weighted Mean, Median, SD, and MAD
mean_sd() median_mad()
Summary Helpers

Convert and Replace Data

Helpers for data replacements

assign_labels()
Assign variable and value labels
labels_to_levels()
Convert value labels into factor levels
coerce_to_numeric()
Convert to Numeric (if possible)
to_numeric()
Convert data to numeric
to_factor()
Convert data to factors
replace_nan_inf()
Convert infinite or NaN values into NA
convert_na_to()
Replace missing values in a variable or a data frame.
convert_to_na()
Convert non-missing values in a variable into missing values.

Import data

Helpers for importing data

data_read() data_write()
Read (import) data files from various sources

Helpers for Data Preparation

Primarily useful in the context of other ‘easystats’ packages

reshape_ci()
Reshape CI between wide/long formats
data_rename() data_rename_rows()
Rename columns and variable names
data_addprefix() data_addsuffix()
Add a prefix or suffix to column names
empty_columns() empty_rows() remove_empty_columns() remove_empty_rows() remove_empty()
Return or remove variables or observations that are completely missing
rownames_as_column() column_as_rownames() rowid_as_column()
Tools for working with row names or row ids
row_to_colnames() colnames_to_row()
Tools for working with column names
data_select() extract_column_names() find_columns()
Find or get columns in a data frame based on search patterns
data_restoretype()
Restore the type of columns according to a reference data frame

Helpers for Text Formatting

Primarily useful for ‘report’ package

Visualization helpers

Primarily useful in the context of other ‘easystats’ packages

visualisation_recipe()
Prepare objects for visualisation

Data

Datasets useful for examples and tests

efc
Sample dataset from the EFC Survey
nhanes_sample
Sample dataset from the National Health and Nutrition Examination Survey