Pandas Cheat Sheet

Pandas is a powerful data manipulation and analysis library for Python. It provides a flexible and efficient way to work with structured data, such as spreadsheets, databases, and CSV files. Pandas allows you to perform tasks such as data cleaning, data exploration, data manipulation, and data visualization.

If you’re new to pandas, it can be overwhelming to learn all the different methods and functions that the library provides. That’s where a cheat sheet comes in handy. A cheat sheet is a concise reference guide that provides a quick overview of the most commonly used pandas methods and functions. It can save you time and help you be more productive by providing a quick reference to commonly used methods and functions.

The pandas cheat sheet I provided is divided into different themes and presented in a table format. The themes include importing data, data exploration, data cleaning, data manipulation, and data visualization. Each table contains a list of commonly used pandas methods and functions for that theme. For example, the data exploration table contains methods for showing the first and last n rows of data, data shape, data types, summary statistics, and more.

The cheat sheet is not meant to be an exhaustive reference guide to pandas. It’s designed to provide a quick reference to commonly used methods and functions. If you’re new to pandas, I recommend using the cheat sheet in conjunction with pandas documentation and tutorials to get a more complete understanding of the library.

Cheat Sheet

Importing Data

TaskCode
Import pandasimport pandas as pd
Import data from CSV filedf = pd.read_csv(‘filename.csv’)
Import data from Excel filedf = pd.read_excel(‘filename.xlsx’)
Import data from SQL databaseimport sqlite3<br>conn = sqlite3.connect(‘database.db’)<br>df = pd.read_sql_query(‘SELECT * FROM tablename’, conn)

Data Exploration

TaskCode
Show first n rows of datadf.head(n)
Show last n rows of datadf.tail(n)
Show data shapedf.shape
Show data typesdf.dtypes
Show summary statisticsdf.describe()
Show unique values in a columndf[‘column’].unique()
Show number of unique values in a columndf[‘column’].nunique()
Show value counts in a columndf[‘column’].value_counts()
Show correlation matrixdf.corr()

Data Cleaning

TaskCode
Rename columnsdf.rename(columns={‘old_name’:’new_name’})
Drop columnsdf.drop(columns=[‘column1’, ‘column2’])
Drop rows with missing valuesdf.dropna()
Fill missing values with a constantdf.fillna(value)
Fill missing values with the meandf.fillna(df.mean())
Replace values in a columndf[‘column’].replace(old_value, new_value)
Remove duplicatesdf.drop_duplicates()

Data Manipulation

TaskCode
Select columnsdf[[‘column1’, ‘column2’]]
Filter rows by a conditiondf[df[‘column’] > value]
Sort data by one or more columnsdf.sort_values(by=[‘column1’, ‘column2’])
Group data by a column and calculate statisticsdf.groupby(‘column’).agg({‘column1’: ‘mean’, ‘column2’: ‘sum’})
Merge two dataframes by a common columnpd.merge(df1, df2, on=’column’)
Pivot tablepd.pivot_table(df, values=’value’, index=’row’, columns=’column’, aggfunc=’mean’)
Apply a function to a columndf[‘column’].apply(function)
Apply a function to a rowdf.apply(function, axis=1)

Data Visualization

TaskCode
Line plotdf.plot(x=’column1′, y=’column2′, kind=’line’)
Bar plotdf.plot(x=’column1′, y=’column2′, kind=’bar’)
Histogramdf[‘column’].hist()
Scatter plotdf.plot(x=’column1′, y=’column2′, kind=’scatter’)
Box plotdf.boxplot(column=’column’)
Heatmapsns.heatmap(df.corr(), annot=True)
Pairplotsns.pairplot(df)

Reference:

https://pandas.pydata.org/docs/