An Introduction to Data Cleansing in Python
A recent study from Kaggle determined that 80% of time data scientists and machine learning engineers spend their time cleaning data. This course is all about how to clean your data.
Course Introduction
Pandas
What is Data Wrangling?
Summary
Download Raw Titanic Data Set
Loading the Dataset
Lab: Save Existing Dataframe to CSV
The Shape of the Data
Subsetting
Using loc
Using iloc and ix
iloc and ix on Rows and Columns
Lab: Slicing Dataframes
The GroupBy Function
Group By Frequency Count
Lab: Grouping
The Series Object
Series Anatomy
Lab: Series Anatomy
Attributes
Series and ndarray Similarity
The Array
Boolean Subsetting in a Series
Vectorized Operations
Lab: Boolean and Variable Attribute Searches
The Replace Functions
Change Column Header Names
Sorting in a Dataframe
Lab: Descriptive Statistics for Pandas Dataframe
Reading an Excel File
Regular Expressions
Binning
Data Normalization
Lab: Normalizing Data
Concatenation
Row Concatenation
Lab: Concatenation Basics
The Merge Join
The Joins
Lab: Using the Merge Function
Missing Data
Finding the NANS
Lab: Missing Data
Filling Index Values
NAN Value Differences
Changing Cell Values
Interpolation
Handling Duplicates
Mappings
Create a Column With a Function
Replace
Lab: Duplicate Data
Time Series
The Time Stamp Object
Lab: Time Series
The Time Delta Object
The Data Time Index
Force a Data Conversion
The Frequency Parameter
Date Offsets
Anchored Offsets
Period Object