Data Wrangling with Python

Learn the basics of the python programming language and Jupyter notebooks. Understand and use the key Python libraries for data manipulation, analysis, and visualization, including Pandas, NumPy, and popular data visualization packages. Perform Excel-like functions using Python including filter, rank, sort, group and aggregate, pivot and cross section.

Duration: 2 days

Who is it for: Anyone who is interested in using python to plan, organize and analyse data for more effective decision making.

Layout: This is a very hands on course, mostly delivered in a follow the instructor style using Jupyter notebooks.


Jupyter Notebooks and Python Fundamentals

  • Code cells, markdown cells, kernels
  • Python datatypes, variables, control structures
  • Collections, list, tuple, set, dict
  • Create user defined functions

Overview of Data Science using Pandas, NumPy, and Matplotlib

  • Shape Data using Pandas – rows, columns, indexes
  • Model Data using Numpy – vector arithmetic
  • Visualize Data using matplotlib – create plots

Introduction to DataFrames

  • Rows, Columns, Indexes, Slices
  • Filter, Rank, Sort and Transpose
  • Add, remove and insert data into a DataFrame
  • Multi Part Indexes and Cross Sections

Introduction to Plotting

  • Anatomy of a figure
  • Plotting with matplotlib
  • Plotting with Seaborn
  • Using the pandas.plot package

Timeseries Data

  • DateTime Indexes, Date Ranges and Frequencies
  • Shifting, Resampling and Interpolating Time Series
  • Moving windows and Expanding Windows
  • Group and Aggregate Time Series

Merging and Grouping

  • Concatenate Data
  • Joins
  • Merge Operations
  • More Grouyping and Aggregation

Pivot Tables and Categorical Data

  • Convert text into categorical data
  • Create Pivot Tables of Data
  • Advanced Filtering Techniques
  • Style, customized formatting and highlighting

Data Preparation and Cleaning

  • Empty values, repeating and duplicate values
  • scikit-learn imputers
  • String conversion to floating point values