dplyr Cheat Sheet

dplyr is a popular package in the R programming language for data manipulation. It provides a set of functions that can be used to filter, arrange, mutate, and summarize data. With dplyr, you can easily perform common data manipulation tasks with concise and readable code.

To help you get started with dplyr, we’ve created a cheat sheet that covers all the basic and advanced data manipulation functions in dplyr. The cheat sheet is organized by themes such as basic data manipulation, joining data frames, conditional data manipulation, data transformation, window functions, and other useful functions. Each function is explained briefly, along with its parameters and syntax, in a table format.

The cheat sheet can be used as a quick reference guide while working with dplyr. You can use it to quickly look up the syntax and parameters of a function or to refresh your memory on a specific function. Additionally, the cheat sheet can be a helpful tool for beginners who are just learning dplyr.

To use the cheat sheet, simply find the function you need in the appropriate section and read the brief description, along with its syntax and parameters. You can also refer to the examples provided in the cheat sheet to get a better understanding of how the functions work in practice.

Cheat Sheet

Basic Data Manipulation

FunctionDescription
filter(df, condition)Filter rows by logical condition
select(df, column1, column2)Select columns
mutate(df, new_column = expression)Add a new column
arrange(df, column1, column2)Sort rows by one or more columns
rename(df, new_column_name = old_column_name)Rename columns
distinct(df, column1, column2)Select distinct rows
sample_n(df, n)Randomly sample n rows
sample_frac(df, frac)Randomly sample frac proportion of rows
slice(df, start:end)Select rows by position
summarise(df, summary = function(column))Summarize data by group or overall
group_by(df, column1, column2)Group data by one or more columns

Joining Data Frames

FunctionDescription
inner_join(df1, df2, by = “column_name”)Return rows with matching values in both data frames
left_join(df1, df2, by = “column_name”)Return all rows from the first data frame, and matching rows from the second data frame
right_join(df1, df2, by = “column_name”)Return all rows from the second data frame, and matching rows from the first data frame
full_join(df1, df2, by = “column_name”)Return all rows from both data frames

Conditional Data Manipulation

FunctionDescription
case_when(condition1 ~ value1, condition2 ~ value2, …)A vectorized if/else statement
if_else(condition, true_value, false_value)An if/else statement
replace(df, condition, new_value)Replace values based on a condition

Data Transformation

FunctionDescription
pivot_longer(df, cols, names_to, values_to)Convert data from wide to long format
pivot_wider(df, names_from, values_from, values_fill)Convert data from long to wide format
gather(df, key, value, columns)Gather columns into key-value pairs
spread(df, key, value)Spread key-value pairs into columns
separate(df, column, into, sep)Split a column into multiple columns
unite(df, column1, column2, sep)Combine multiple columns into a single column

Window Functions

FunctionDescription
lead(column, n)Returns the nth value after the current row
lag(column, n)Returns the nth value before the current row
rank(column)Ranks values within a group
dense_rank(column)Ranks values within a group, skipping ranks if they are tied
cumsum(column)Returns the cumulative sum of a column
cummean(column)Returns the cumulative mean of a column

Other Useful Functions

FunctionDescriptionSyntax
n()Number of rows in a groupsummarise(df, count = n())
top_n()Select top n rows by a variabletop_n(df, n, variable)
ntile()Divide rows into n quantilesmutate(df, quantile = ntile(column, n))
between()Select values within a rangefilter(df, between(column, start, end))
coalesce()Return the first non-missing valuemutate(df, new_column = coalesce(column1, column2, …))
if_na()Replace NA values with a specific valueif_na(column, value)
everything()Select all columnsselect(df, everything())
across()Apply a function to multiple columnsmutate(df, across(columns, function))
row_number()Assign a unique number to each rowmutate(df, row_num = row_number())
summarize_all()Apply a function to all columnssummarize_all(df, function)
mutate_all()Apply a function to all columnsmutate_all(df, function)
select_if()Select columns based on a conditionselect_if(df, condition)
mutate_if()Mutate columns based on a conditionmutate_if(df, condition, function)
summarize_if()Summarize columns based on a conditionsummarize_if(df, condition, function)

Reference:

https://dplyr.tidyverse.org/