Day 23-29: Diving into Pandas

As I ventured into the world of Pandas, I found myself caught in a huge backlog of work commitments. I somehow managed to make significant progress in my Python studies though, as I'm excited to share with you. For this topic, I turned to an online book that had been patiently waiting in my digital library (Packt Publishing): "The Pandas Workshop" by Blaine Bateman, Saikat Basak, Thomas V. Joseph, and William So.

Although I only managed to cover Chapter 1, it was quite a journey that taught me how to set up a Jupyter notebook using Anaconda (In a previous post, I mentioned using a Jupyter notebook with Pycharm). This initial chapter lived up to its promise, serving as an quick introduction and crash course to Pandas. By the end, my head was spinning, but my thirst for knowledge had been quenched.

One intriguing term that I had encountered in the writings of others resurfaced, and this time, I took a moment to investigate. The term is "Data Wrangling". Wikipedia explained that data wrangling, sometimes referred to as data munging, is a process in which "raw" data is transformed and mapped into another format, making it usable for downstream purposes. Pandas truly shines in this area, offering a great number of features that greatly assist with data wrangling tasks. Delving into the history of Pandas, I discovered that its creator, Wes McKinney, is an MIT graduate with expertise in quantitative finance.

In an effort to save myself from looking all over for these bits, I've jotted down some notes that helped me set up Jupyter notebooks using the resources from The Pandas Workshop. Let me share these valuable insights with you:

1) Copy The-Pandas-Workshop to a folder.
2) CD to that folder and run the following in terminal:
virtualenv --python [path-to-python.exe]\python.exe venv
3) Activate the virtual environment:
.\venv\Scripts\activate
4) The following command creates a file called requirements.txt that enumerates the installed packages.
pip freeze > requirements.txt
5) pip install -r requirements.txt
6) To deactivate: deactivate

How to add-change-remove kernel in Jupyter

a) source your-venv/bin/activate
b) (your-venv)$ pip install jupyter
c) (your-env)$ ipython kernel install --name "local-venv" --user

jupyter notebook --generate-config
This generates ~/.jupyter/jupyter_notebook_config.py file with some helpfully commented possible options. To set the default directory add:
c.NotebookApp.notebook_dir = u'/absolute/path/to/notebook/directory'

The actual Pandas topics started off with the concept of Series which is a one-dimensional array. It can hold any type of data.

Example:

import pandas as pd
series1 = pd.Series([[1, 2, 3, 4, 5]])
series2 = pd.Series([[11, 12, 13, 14], [6, 7, 'test'], {'id':'001', 'name': 'john'}])

The next important concept is the DataFrame which is a two-dimensional representation of data arranged in rows and columns.
The following example shows the creation of a DataFrame called df1 with a list of three elements. It then outputs the shape of df1.
In the output, the first element is the number of rows and the second element is the number of columns.

Example:

df1 = pd.DataFrame([10, 20, 30])
df.shape

Output:
(3, 1)

With the whirlwind of theories and tasks that I tackled in such a brief period, I became determined to make my Jupyter notebooks easily accessible from anywhere. This realization led me to create a free account on Anaconda Cloud (Anaconda Nucleus), a fantastic platform for sharing and collaborating on Jupyter notebooks. Trust me, it's worth checking out!

Blog

Day 23-29: Diving into Pandas

John Enad

Join Our Newsletter. No Spam, Only the good stuff.

Related