Course
Data Wrangling with Python
Learn the basics of the python programming language and Jupyter notebooks. Understand and use the key Python libraries for data manipulation, analysis, and visualization, including Pandas, NumPy, and popular data visualization packages. Perform Excel-like functions using Python including filter, rank, sort, group and aggregate, pivot and cross section.
Duration: 2 days
Who is it for: Anyone who is interested in using python to plan, organize and analyse data for more effective decision making.
Layout: This is a very hands on course, mostly delivered in a follow the instructor style using Jupyter notebooks.
Modules
Jupyter Notebooks and Python Fundamentals
- Code cells, markdown cells, kernels
- Python datatypes, variables, control structures
- Collections, list, tuple, set, dict
- Create user defined functions
Overview of Data Science using Pandas, NumPy, and Matplotlib
- Shape Data using Pandas – rows, columns, indexes
- Model Data using Numpy – vector arithmetic
- Visualize Data using matplotlib – create plots
Introduction to DataFrames
- Rows, Columns, Indexes, Slices
- Filter, Rank, Sort and Transpose
- Add, remove and insert data into a DataFrame
- Multi Part Indexes and Cross Sections
Introduction to Plotting
- Anatomy of a figure
- Plotting with matplotlib
- Plotting with Seaborn
- Using the pandas.plot package
Timeseries Data
- DateTime Indexes, Date Ranges and Frequencies
- Shifting, Resampling and Interpolating Time Series
- Moving windows and Expanding Windows
- Group and Aggregate Time Series
Merging and Grouping
- Concatenate Data
- Joins
- Merge Operations
- More Grouyping and Aggregation
Pivot Tables and Categorical Data
- Convert text into categorical data
- Create Pivot Tables of Data
- Advanced Filtering Techniques
- Style, customized formatting and highlighting
Data Preparation and Cleaning
- Empty values, repeating and duplicate values
- scikit-learn imputers
- String conversion to floating point values