Learn Data Analysis with Python in this comprehensive tutorial for beginners, with exercises included!
Data Analysis has been around for a long time, but up until a few years ago, it was practiced using closed, expensive and limited tools like Excel or Tableau. Python, SQL and other open libraries have changed Data Analysis forever.
In this tutorial you'll learn the whole process of Data Analysis: reading data from multiple sources (CSVs, SQL, Excel, etc), processing them using NumPy and Pandas, visualize them using Matplotlib and Seaborn and clean and process it to create reports.
⚠️ Note: Instead of loading the
notebooks on notebooks.ai, you should use Google Colab instead. Here are
instructions on loading a notebook directly from GitHub into Google Colab: https://colab.research.google.com/github/googlecolab/colabtools/blob/master/notebooks/colab-github-demo.ipynb#scrollTo=K-NVg7RjyeTk
⭐️ Course Contents ⭐️
⌨️ Part 1: Introduction
What is Data Analysis, why Python?, what other options are
there? what's the cycle of a Data Analysis project? What's the difference
between Data Analysis and Data Science?
🔗 Slides for this section:
https://docs.google.com/presentation/d/1fDpjlyMiOMJyuc7_jMekcYLPP2XlSl1eWw9F7yE7byk/edit#slide=id.p
⌨️ Part 2: Real Life Example of a
Python/Pandas Data Analysis project (00:11:11)
A demonstration of a real life data analysis project using
Python, Pandas, SQL and Seaborn. Don't worry, we'll dig deeper in the following
sections
🔗 Notebooks: https://github.com/ine-rmotr-curriculum/FreeCodeCamp-Pandas-Real-Life-Example
⌨️ Part 3: Jupyter Notebooks
Tutorial (00:30:50)
A step by step tutorial to learn how to use Juptyer
Notebooks
🔗 Notebooks: https://github.com/ine-rmotr-curriculum/ds-content-interactive-jupyterlab-tutorial
⌨️ Part 4: Intro to NumPy
(01:04:58)
Learn why NumPy was such an important library for the
data-processing world in Python. Learn about low level details of computations
and memory storage, and why tools like Excel will always be limited when
processing large volumes of data.
🔗 Notebooks: https://github.com/ine-rmotr-curriculum/freecodecamp-intro-to-numpy
⌨️ Part 5: Intro to Pandas
(01:57:08)
Pandas is arguably the most important library for Data
Processing in the Python world. Learn how it works and how its main data
structure, the Data Frame, compares to other tools like spreadsheets or DFs
used for Big Data
🔗 Notebooks: https://github.com/ine-rmotr-curriculum/freecodecamp-intro-to-pandas
⌨️ Part 6: Data Cleaning (02:47:18)
Learn the different types of issues that we'll face with our
data: null values, invalid values, statistical outliers, etc, and how to clean
them.
🔗 Notebooks: https://github.com/ine-rmotr-curriculum/data-cleaning-rmotr-freecodecamp
⌨️ Part 7: Reading Data from other
sources (03:25:15)
🔗 Notebooks: https://github.com/ine-rmotr-curriculum/RDP-Reading-Data-with-Python-and-Pandas
⌨️ Part 8: Python Recap (03:55:19)
If your Python or coding skills are rusty, check out this
section for a quick recap of Python main features and control flow structures.
🔗 Notebooks: https://github.com/ine-rmotr-curriculum/ds-content-python-under-10-minutes