Cheat Sheet: Data Preprocessing Tasks in Pandas & Plot Libraries


Cheat Sheet : Data Preprocessing Tasks in Pandas

Task
Syntax
Description
Example
Load CSV datapd.read_csv('filename.csv')Read data from a CSV file into a Pandas DataFramedf_can=pd.read_csv('data.csv')
Handling Missing Valuesdf.dropna()Drop rows with missing valuesdf_can.dropna()
df.fillna(value)Fill missing values with a specified valuedf_can.fillna(0)
Removing Duplicatesdf.drop_duplicates()Remove duplicate rowsdf_can.drop_duplicates()
Renaming Columnsdf.rename(columns={'old_name': 'new_name'})Rename one or more columnsdf_can.rename(columns={'Age': 'Years'})
Selecting Columnsdf['column_name'] or df.column_nameSelect a single columndf_can.Age or df_can['Age]'
df[['col1', 'col2']]Select multiple columnsdf_can[['Name', 'Age']]
Filtering Rowsdf[df['column'] > value]Filter rows based on a conditiondf_can[df_can['Age'] > 30]
Applying Functions to Columnsdf['column'].apply(function_name)Apply a function to transform values in a columndf_can['Age'].apply(lambda x: x + 1)
Creating New Columnsdf['new_column'] = expressionCreate a new column with values derived from existing onesdf_can['Total'] = df_can['Quantity'] * df_can['Price']
Grouping and Aggregatingdf.groupby('column').agg({'col1': 'sum', 'col2': 'mean'})Group rows by a column and apply aggregate functionsdf_can.groupby('Category').agg({'Total': 'mean'})
Sorting Rowsdf.sort_values('column', ascending=True/False)Sort rows based on a columndf_can.sort_values('Date', ascending=True)
Displaying First n Rowsdf.head(n)Show the first n rows of the DataFramedf_can.head(3)
Displaying Last n Rowsdf.tail(n)Show the last n rows of the DataFramedf_can.tail(3)
Checking for Null Valuesdf.isnull()Check for null values in the DataFramedf_can.isnull()
Selecting Rows by Indexdf.iloc[index]Select rows based on integer indexdf_can.iloc[3]
df.iloc[start:end]Select rows in a specified rangedf_can.iloc[2:5]
Selecting Rows by Labeldf.loc[label]Select rows based on label/index namedf_can.loc['Label']
df.loc[start:end]Select rows in a specified label/index rangedf_can.loc['Age':'Quantity']
Summary Statisticsdf.describe()Generates descriptive statistics for numerical columnsdf_can.describe()

Cheat Sheet : Plot Libraries

LibraryMain PurposeKey FeaturesProgramming LanguageLevel of CustomizationDashboard CapabilitiesTypes of Plots Possible
MatplotlibGeneral-purpose plottingComprehensive plot types and variety of customization optionsPythonHighRequires additional components and customizationLine plots, scatter plots, bar charts, histograms, pie charts, box plots, heatmaps, etc.
PandasFundamentally used for data manipulation but also has plotting functionalityEasy to plot directly on Panda data structuresPythonMediumCan be combined with web frameworks for creating dashboardsLine plots, scatter plots, bar charts, histograms, pie charts, box plots, etc.
SeabornStatistical data visualizationStylish, specialized statistical plot typesPythonMediumCan be combined with other libraries to display plots on dashboardsHeatmaps, violin plots, scatter plots, bar plots, count plots, etc.
PlotlyInteractive data visualizationinteractive web-based visualizationsPython, R, JavaScriptHighDash framework is dedicated for building interactive dashboardsLine plots, scatter plots, bar charts, pie charts, 3D plots, choropleth maps, etc.
FoliumGeospatial data visualizationInteractive, customizable mapsPythonMediumFor incorporating maps into dashboards, it can be integrated with other frameworks/librariesChoropleth maps, point maps, heatmaps, etc.
PyWafflePlotting Waffle chartsWaffle chartsPythonLowCan be combined with other libraries to display waffle chart on dashboardsWaffle charts, square pie charts, donut charts, etc.

Comments

Popular posts from this blog

Common cybersecurity terminology

Introduction to security frameworks and controls

syllabus