Pandas Cheat Sheet | Cheatsheetindex

Pandas is a powerful data manipulation and analysis library for Python. It provides a flexible and efficient way to work with structured data, such as spreadsheets, databases, and CSV files. Pandas allows you to perform tasks such as data cleaning, data exploration, data manipulation, and data visualization.

If you’re new to pandas, it can be overwhelming to learn all the different methods and functions that the library provides. That’s where a cheat sheet comes in handy. A cheat sheet is a concise reference guide that provides a quick overview of the most commonly used pandas methods and functions. It can save you time and help you be more productive by providing a quick reference to commonly used methods and functions.

The pandas cheat sheet I provided is divided into different themes and presented in a table format. The themes include importing data, data exploration, data cleaning, data manipulation, and data visualization. Each table contains a list of commonly used pandas methods and functions for that theme. For example, the data exploration table contains methods for showing the first and last n rows of data, data shape, data types, summary statistics, and more.

The cheat sheet is not meant to be an exhaustive reference guide to pandas. It’s designed to provide a quick reference to commonly used methods and functions. If you’re new to pandas, I recommend using the cheat sheet in conjunction with pandas documentation and tutorials to get a more complete understanding of the library.

Cheat Sheet

Importing Data

Task	Code
Import pandas	import pandas as pd
Import data from CSV file	df = pd.read_csv(‘filename.csv’)
Import data from Excel file	df = pd.read_excel(‘filename.xlsx’)
Import data from SQL database	import sqlite3<br>conn = sqlite3.connect(‘database.db’)<br>df = pd.read_sql_query(‘SELECT * FROM tablename’, conn)

Data Exploration

Task	Code
Show first n rows of data	df.head(n)
Show last n rows of data	df.tail(n)
Show data shape	df.shape
Show data types	df.dtypes
Show summary statistics	df.describe()
Show unique values in a column	df[‘column’].unique()
Show number of unique values in a column	df[‘column’].nunique()
Show value counts in a column	df[‘column’].value_counts()
Show correlation matrix	df.corr()

Data Cleaning

Task	Code
Rename columns	df.rename(columns={‘old_name’:’new_name’})
Drop columns	df.drop(columns=[‘column1’, ‘column2’])
Drop rows with missing values	df.dropna()
Fill missing values with a constant	df.fillna(value)
Fill missing values with the mean	df.fillna(df.mean())
Replace values in a column	df[‘column’].replace(old_value, new_value)
Remove duplicates	df.drop_duplicates()

Data Manipulation

Task	Code
Select columns	df[[‘column1’, ‘column2’]]
Filter rows by a condition	df[df[‘column’] > value]
Sort data by one or more columns	df.sort_values(by=[‘column1’, ‘column2’])
Group data by a column and calculate statistics	df.groupby(‘column’).agg({‘column1’: ‘mean’, ‘column2’: ‘sum’})
Merge two dataframes by a common column	pd.merge(df1, df2, on=’column’)
Pivot table	pd.pivot_table(df, values=’value’, index=’row’, columns=’column’, aggfunc=’mean’)
Apply a function to a column	df[‘column’].apply(function)
Apply a function to a row	df.apply(function, axis=1)

Data Visualization

Task	Code
Line plot	df.plot(x=’column1′, y=’column2′, kind=’line’)
Bar plot	df.plot(x=’column1′, y=’column2′, kind=’bar’)
Histogram	df[‘column’].hist()
Scatter plot	df.plot(x=’column1′, y=’column2′, kind=’scatter’)
Box plot	df.boxplot(column=’column’)
Heatmap	sns.heatmap(df.corr(), annot=True)
Pairplot	sns.pairplot(df)

Reference:

https://pandas.pydata.org/docs/

Cheat Sheet

Importing Data

Data Exploration

Data Cleaning

Data Manipulation

Data Visualization

Related Posts

PySpark Cheat Sheet

Python Regex Cheat Sheet

NumPy Cheat Sheet

Anaconda Cheat Sheet