Powerful Pandas! Part-1

Today we gonna cover Pandas library.
Pandas is a python library which usually use for data manipulation and data analysis. Mostly used in Data Science and Machine Learning. In this notebook we gonna show how powerful pandas library is!

Let's get started!

Let's call the numpy and pandas library into our workspace. Here, we are using kaggle notebook where these libraries are already installed.

import numpy as np
import pandas as pd

If these libraries aren't installed in you IDE, you have to install them before calling them.

1. Pandas Series

Let's create a series with pandas.

a1=['a','b','c']
my_data=[50,70,30]
ar=np.array(my_data)
d={'a':50,'b':70,'c':30}

pd.Series(data=my_data, index=a1)

Same thing could be done with:

pd.Series(my_data,a1)

and also with:

pd.Series(d)

Indexing in series

series1=pd.Series([1,2,3,4],['A','B','C','D'])
series1

series1['C']

2. Pandas DataFrames

Call the required library for creating data frame in python with pandas.

import numpy as np
import pandas as pd

from numpy.random import randn

Setting a fixed seed point as we want to draw the same set of random numbers each time we run the code. Otherwise our result would be vary every time we run the code.

np.random.seed(1011)

df=pd.DataFrame(randn(5,4),['A','B','C','D','E'],['W','X','Y','Z'])
df

Here is our data frame.

If we want to grab the column 'W', output gives a series

df['W']

another way to grab a column like sql

df.W

If we want to grab multiple column, output gives a dataframe

df[['W','Z']]

Add a column
Let's add a column to the data frame

df['H']=df['W']+df['Z']

Delete a column
To delete a column we will use drop function

df.drop('H',axis=1)

But if you run again the dataframe new column is still there, so we have to add another argument.

df.drop('H',axis=1,inplace=True)

this permanently deletes the column.

Selecting rows, labelbased index:

df.loc[['A','B'],['W','Y']]

Conditional selection
Select rows where W column value is greater than zero along with Y and X column.

df[df['W']>0][['Y','X']]

Multiple selection: Can you explain what result will give the following code?

df[(df['W']>0) & (df['Y']>1)]

df[(df['W']>0) | (df['Y']>1)]

Multi-level index or index higher key
Now we will create a data frame with index more than one level.

outside=['G1','G1','G1','G2','G2','G2']
inside=[1,2,3,1,2,3]
hi_index=list(zip(outside,inside))
hi_index=pd.MultiIndex.from_tuples(hi_index)

df=pd.DataFrame(randn(6,2),hi_index,['A','B'])

df

To grab everything under G1

df.loc['G1']

Try to explain which value we want to grab with following code:

df.loc['G2'].loc[2]['B']

3. Read CSV file

CSV files contains plain text and is a well know format that can be read by everyone including Pandas.

df = pd.read_csv('/kaggle/input/pandas/data_set.csv')

print(df.to_string())

4. Correlations

The relationship between each column in your data set can be calculated by cor() method. The relationship between the columns of our data

df.corr()

Correlation value varies from -1 to 1. Negative value indicate negative relationship that is if values of variable increases, other will decreases. Positive value mean a positive relationship, values of variable increases, other will increase too. 1 indicates perfect relationship.

You can practice more example at your own. The notebook link is given below. Go to the link and practice.
Notebook Link: [https://www.kaggle.com/code/azizaafrin/powerful-pandas-part-1]

Happy Learning!❤️

Aziza Afrin

Blog

Powerful Pandas! Part-1

Aziza Afrin

1. Pandas Series

2. Pandas DataFrames

3. Read CSV file

4. Correlations

Join Our Newsletter. No Spam, Only the good stuff.

Related