How to Remember Pandas Index Methods
Jeff Hale
Posted on July 19, 2019
When method names are similar, it's difficult to keep them separate in your mind.
This makes remembering them harder.
Pandas has a slew of methods for creating and adjusting a DataFrame index.
This is a brief guide to help you create a little mental space between methods for easier memorization.
The Jupyter Notebook is on Kaggle here.
import pandas as pd
import numpy as np
Make a DataFrame without specifying an index (you get a default index).
df = pd.DataFrame(dict(a=[1,2,3,4], b=[2,5,6,4]))
df
a | b | |
---|---|---|
0 | 1 | 2 |
1 | 2 | 5 |
2 | 3 | 6 |
3 | 4 | 4 |
Make a DataFrame with an index by using the index keyword argument.
df2 = pd.DataFrame(dict(a=[1,2,3,4], b=[2,5,6,4]), index = [1,2,5,6])
df2
a | b | |
---|---|---|
1 | 1 | 2 |
2 | 2 | 5 |
5 | 3 | 6 |
6 | 4 | 4 |
Move a column to be the index with .set_index()
df3 = df2.set_index("a")
df3
b | |
---|---|
a | |
1 | 2 |
2 | 5 |
3 | 6 |
4 | 4 |
Rename the index values from scratch with .index
df3.index = [2,3,4,5]
df3
b | |
---|---|
2 | 2 |
3 | 5 |
4 | 6 |
5 | 4 |
Note that index
is a property of the DataFrame not a method, so the syntax is different.
Nuke the index values and start over from 0 with .reset_index()
df4 = df3.reset_index()
df4
index | b | |
---|---|---|
0 | 2 | 2 |
1 | 3 | 5 |
2 | 4 | 6 |
3 | 5 | 4 |
If you don't want the index to become a column, pass drop=True
to reset_index()
.
df5 = df3.reset_index(drop=True)
df5
b | |
---|---|
0 | 2 |
1 | 5 |
2 | 6 |
3 | 4 |
Reorder the rows with .reindex()
df6 = df5.reindex([2,3,1,0])
df6
b | |
---|---|
2 | 6 |
3 | 4 |
1 | 5 |
0 | 2 |
Passing a value that isn't in the index results in a NaN.
df7 = df5.reindex([2,3,1,0,6])
df7
b | |
---|---|
2 | 6.0 |
3 | 4.0 |
1 | 5.0 |
0 | 2.0 |
6 | NaN |
Advice
Ideally, add an index when you create your DataFrame with index =
.
If reading from a .csv file you can set an index column by passing the column number.
For example:
df = pd.read_csv(my_csv, index_col=3)
Or pass index_col=False
to exlcude.
How to set or change the index:
df.set_index()
- move a column to the indexdf.index
- add an index manuallydf.reset_index()
- reset the index to 0, 1, 2 ...df.reindex()
- reorder the rows
Word associations to remember:
set_index()
- move columnindex
- manualreset_index()
- resetreindex
- reorder
Wrap
I hope this article helped you create a little mental space to keep Pandas index methods straight. If it did, please give it some love so other people can find it, too.
I write about Data Science, Dev Ops, Python and other stuff. Check out my other articles if any of that sounds interesting.
Follow me and connect:
Medium
Dev.to
Twitter
LinkedIn
Kaggle
GitHub
Happy indexing!
Posted on July 19, 2019
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.