dplyr Cheat Sheet | Cheatsheetindex

dplyr is a popular package in the R programming language for data manipulation. It provides a set of functions that can be used to filter, arrange, mutate, and summarize data. With dplyr, you can easily perform common data manipulation tasks with concise and readable code.

To help you get started with dplyr, we’ve created a cheat sheet that covers all the basic and advanced data manipulation functions in dplyr. The cheat sheet is organized by themes such as basic data manipulation, joining data frames, conditional data manipulation, data transformation, window functions, and other useful functions. Each function is explained briefly, along with its parameters and syntax, in a table format.

The cheat sheet can be used as a quick reference guide while working with dplyr. You can use it to quickly look up the syntax and parameters of a function or to refresh your memory on a specific function. Additionally, the cheat sheet can be a helpful tool for beginners who are just learning dplyr.

To use the cheat sheet, simply find the function you need in the appropriate section and read the brief description, along with its syntax and parameters. You can also refer to the examples provided in the cheat sheet to get a better understanding of how the functions work in practice.

Cheat Sheet

Basic Data Manipulation

Function	Description
filter(df, condition)	Filter rows by logical condition
select(df, column1, column2)	Select columns
mutate(df, new_column = expression)	Add a new column
arrange(df, column1, column2)	Sort rows by one or more columns
rename(df, new_column_name = old_column_name)	Rename columns
distinct(df, column1, column2)	Select distinct rows
sample_n(df, n)	Randomly sample n rows
sample_frac(df, frac)	Randomly sample frac proportion of rows
slice(df, start:end)	Select rows by position
summarise(df, summary = function(column))	Summarize data by group or overall
group_by(df, column1, column2)	Group data by one or more columns

Joining Data Frames

Function	Description
inner_join(df1, df2, by = “column_name”)	Return rows with matching values in both data frames
left_join(df1, df2, by = “column_name”)	Return all rows from the first data frame, and matching rows from the second data frame
right_join(df1, df2, by = “column_name”)	Return all rows from the second data frame, and matching rows from the first data frame
full_join(df1, df2, by = “column_name”)	Return all rows from both data frames

Conditional Data Manipulation

Function	Description
case_when(condition1 ~ value1, condition2 ~ value2, …)	A vectorized if/else statement
if_else(condition, true_value, false_value)	An if/else statement
replace(df, condition, new_value)	Replace values based on a condition

Data Transformation

Function	Description
pivot_longer(df, cols, names_to, values_to)	Convert data from wide to long format
pivot_wider(df, names_from, values_from, values_fill)	Convert data from long to wide format
gather(df, key, value, columns)	Gather columns into key-value pairs
spread(df, key, value)	Spread key-value pairs into columns
separate(df, column, into, sep)	Split a column into multiple columns
unite(df, column1, column2, sep)	Combine multiple columns into a single column

Window Functions

Function	Description
lead(column, n)	Returns the nth value after the current row
lag(column, n)	Returns the nth value before the current row
rank(column)	Ranks values within a group
dense_rank(column)	Ranks values within a group, skipping ranks if they are tied
cumsum(column)	Returns the cumulative sum of a column
cummean(column)	Returns the cumulative mean of a column

Other Useful Functions

Function	Description	Syntax
n()	Number of rows in a group	summarise(df, count = n())
top_n()	Select top n rows by a variable	top_n(df, n, variable)
ntile()	Divide rows into n quantiles	mutate(df, quantile = ntile(column, n))
between()	Select values within a range	filter(df, between(column, start, end))
coalesce()	Return the first non-missing value	mutate(df, new_column = coalesce(column1, column2, …))
if_na()	Replace NA values with a specific value	if_na(column, value)
everything()	Select all columns	select(df, everything())
across()	Apply a function to multiple columns	mutate(df, across(columns, function))
row_number()	Assign a unique number to each row	mutate(df, row_num = row_number())
summarize_all()	Apply a function to all columns	summarize_all(df, function)
mutate_all()	Apply a function to all columns	mutate_all(df, function)
select_if()	Select columns based on a condition	select_if(df, condition)
mutate_if()	Mutate columns based on a condition	mutate_if(df, condition, function)
summarize_if()	Summarize columns based on a condition	summarize_if(df, condition, function)

Reference:

https://dplyr.tidyverse.org/

Cheat Sheet

Basic Data Manipulation

Joining Data Frames

Conditional Data Manipulation

Data Transformation

Window Functions

Other Useful Functions

Related Posts

Tidyr Cheat Sheet

Tidyverse Cheat Sheet

Lubridate Cheat Sheet

R Regex Cheat Sheet