dplyr is a popular package in the R programming language for data manipulation. It provides a set of functions that can be used to filter, arrange, mutate, and summarize data. With dplyr, you can easily perform common data manipulation tasks with concise and readable code.
To help you get started with dplyr, we’ve created a cheat sheet that covers all the basic and advanced data manipulation functions in dplyr. The cheat sheet is organized by themes such as basic data manipulation, joining data frames, conditional data manipulation, data transformation, window functions, and other useful functions. Each function is explained briefly, along with its parameters and syntax, in a table format.
The cheat sheet can be used as a quick reference guide while working with dplyr. You can use it to quickly look up the syntax and parameters of a function or to refresh your memory on a specific function. Additionally, the cheat sheet can be a helpful tool for beginners who are just learning dplyr.
To use the cheat sheet, simply find the function you need in the appropriate section and read the brief description, along with its syntax and parameters. You can also refer to the examples provided in the cheat sheet to get a better understanding of how the functions work in practice.
Cheat Sheet
Basic Data Manipulation
Function
Description
filter(df, condition)
Filter rows by logical condition
select(df, column1, column2)
Select columns
mutate(df, new_column = expression)
Add a new column
arrange(df, column1, column2)
Sort rows by one or more columns
rename(df, new_column_name = old_column_name)
Rename columns
distinct(df, column1, column2)
Select distinct rows
sample_n(df, n)
Randomly sample n rows
sample_frac(df, frac)
Randomly sample frac proportion of rows
slice(df, start:end)
Select rows by position
summarise(df, summary = function(column))
Summarize data by group or overall
group_by(df, column1, column2)
Group data by one or more columns
Joining Data Frames
Function
Description
inner_join(df1, df2, by = “column_name”)
Return rows with matching values in both data frames
left_join(df1, df2, by = “column_name”)
Return all rows from the first data frame, and matching rows from the second data frame
right_join(df1, df2, by = “column_name”)
Return all rows from the second data frame, and matching rows from the first data frame