Pandas is a powerful data manipulation and analysis library for Python. It provides a flexible and efficient way to work with structured data, such as spreadsheets, databases, and CSV files. Pandas allows you to perform tasks such as data cleaning, data exploration, data manipulation, and data visualization.
If you’re new to pandas, it can be overwhelming to learn all the different methods and functions that the library provides. That’s where a cheat sheet comes in handy. A cheat sheet is a concise reference guide that provides a quick overview of the most commonly used pandas methods and functions. It can save you time and help you be more productive by providing a quick reference to commonly used methods and functions.
The pandas cheat sheet I provided is divided into different themes and presented in a table format. The themes include importing data, data exploration, data cleaning, data manipulation, and data visualization. Each table contains a list of commonly used pandas methods and functions for that theme. For example, the data exploration table contains methods for showing the first and last n rows of data, data shape, data types, summary statistics, and more.
The cheat sheet is not meant to be an exhaustive reference guide to pandas. It’s designed to provide a quick reference to commonly used methods and functions. If you’re new to pandas, I recommend using the cheat sheet in conjunction with pandas documentation and tutorials to get a more complete understanding of the library.
Cheat Sheet
Importing Data
Task | Code |
Import pandas | import pandas as pd |
Import data from CSV file | df = pd.read_csv(‘filename.csv’) |
Import data from Excel file | df = pd.read_excel(‘filename.xlsx’) |
Import data from SQL database | import sqlite3<br>conn = sqlite3.connect(‘database.db’)<br>df = pd.read_sql_query(‘SELECT * FROM tablename’, conn) |
Data Exploration
Task | Code |
Show first n rows of data | df.head(n) |
Show last n rows of data | df.tail(n) |
Show data shape | df.shape |
Show data types | df.dtypes |
Show summary statistics | df.describe() |
Show unique values in a column | df[‘column’].unique() |
Show number of unique values in a column | df[‘column’].nunique() |
Show value counts in a column | df[‘column’].value_counts() |
Show correlation matrix | df.corr() |
Data Cleaning
Task | Code |
Rename columns | df.rename(columns={‘old_name’:’new_name’}) |
Drop columns | df.drop(columns=[‘column1’, ‘column2’]) |
Drop rows with missing values | df.dropna() |
Fill missing values with a constant | df.fillna(value) |
Fill missing values with the mean | df.fillna(df.mean()) |
Replace values in a column | df[‘column’].replace(old_value, new_value) |
Remove duplicates | df.drop_duplicates() |
Data Manipulation
Task | Code |
Select columns | df[[‘column1’, ‘column2’]] |
Filter rows by a condition | df[df[‘column’] > value] |
Sort data by one or more columns | df.sort_values(by=[‘column1’, ‘column2’]) |
Group data by a column and calculate statistics | df.groupby(‘column’).agg({‘column1’: ‘mean’, ‘column2’: ‘sum’}) |
Merge two dataframes by a common column | pd.merge(df1, df2, on=’column’) |
Pivot table | pd.pivot_table(df, values=’value’, index=’row’, columns=’column’, aggfunc=’mean’) |
Apply a function to a column | df[‘column’].apply(function) |
Apply a function to a row | df.apply(function, axis=1) |
Data Visualization
Task | Code |
Line plot | df.plot(x=’column1′, y=’column2′, kind=’line’) |
Bar plot | df.plot(x=’column1′, y=’column2′, kind=’bar’) |
Histogram | df[‘column’].hist() |
Scatter plot | df.plot(x=’column1′, y=’column2′, kind=’scatter’) |
Box plot | df.boxplot(column=’column’) |
Heatmap | sns.heatmap(df.corr(), annot=True) |
Pairplot | sns.pairplot(df) |
Reference: