How to Remember Pandas Index Methods

discdiver

Jeff Hale

Posted on July 19, 2019

How to Remember Pandas Index Methods

When method names are similar, it's difficult to keep them separate in your mind.
This makes remembering them harder.

Pandas has a slew of methods for creating and adjusting a DataFrame index.
This is a brief guide to help you create a little mental space between methods for easier memorization.

The Jupyter Notebook is on Kaggle here.

import pandas as pd
import numpy as np
Enter fullscreen mode Exit fullscreen mode

Make a DataFrame without specifying an index (you get a default index).

df = pd.DataFrame(dict(a=[1,2,3,4], b=[2,5,6,4]))
df
Enter fullscreen mode Exit fullscreen mode
a b
0 1 2
1 2 5
2 3 6
3 4 4

Make a DataFrame with an index by using the index keyword argument.

df2 = pd.DataFrame(dict(a=[1,2,3,4], b=[2,5,6,4]), index = [1,2,5,6])
df2
Enter fullscreen mode Exit fullscreen mode
a b
1 1 2
2 2 5
5 3 6
6 4 4

Move a column to be the index with .set_index()

df3 = df2.set_index("a")
df3
Enter fullscreen mode Exit fullscreen mode
b
a
1 2
2 5
3 6
4 4

Rename the index values from scratch with .index

df3.index = [2,3,4,5]
df3
Enter fullscreen mode Exit fullscreen mode
b
2 2
3 5
4 6
5 4

Note that index is a property of the DataFrame not a method, so the syntax is different.

Nuke the index values and start over from 0 with .reset_index()

df4 = df3.reset_index()
df4
Enter fullscreen mode Exit fullscreen mode
index b
0 2 2
1 3 5
2 4 6
3 5 4

If you don't want the index to become a column, pass drop=True to reset_index().

df5 = df3.reset_index(drop=True)
df5
Enter fullscreen mode Exit fullscreen mode
b
0 2
1 5
2 6
3 4

Reorder the rows with .reindex()

df6 = df5.reindex([2,3,1,0])
df6
Enter fullscreen mode Exit fullscreen mode
b
2 6
3 4
1 5
0 2

Passing a value that isn't in the index results in a NaN.

df7 = df5.reindex([2,3,1,0,6])
df7
Enter fullscreen mode Exit fullscreen mode
b
2 6.0
3 4.0
1 5.0
0 2.0
6 NaN

Advice

Ideally, add an index when you create your DataFrame with index =.

If reading from a .csv file you can set an index column by passing the column number.

For example:

df = pd.read_csv(my_csv, index_col=3)

Or pass index_col=False to exlcude.

How to set or change the index:

  • df.set_index() - move a column to the index

  • df.index - add an index manually

  • df.reset_index() - reset the index to 0, 1, 2 ...

  • df.reindex() - reorder the rows

Word associations to remember:

  • set_index() - move column

  • index - manual

  • reset_index() - reset

  • reindex - reorder

Wrap

I hope this article helped you create a little mental space to keep Pandas index methods straight. If it did, please give it some love so other people can find it, too.

I write about Data Science, Dev Ops, Python and other stuff. Check out my other articles if any of that sounds interesting.

Follow me and connect:
Medium
Dev.to
Twitter
LinkedIn
Kaggle
GitHub

Reset Button

Happy indexing!

💖 💪 🙅 🚩
discdiver
Jeff Hale

Posted on July 19, 2019

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related