Plotting ZIP Codes using Kepler.gl
vejendla
Posted on April 3, 2023
Do you have a set of data that contains ZIP Codes, and want to display that data using a free, open-source tool?
Then read on to learn how I did exactly that to help a non-profit in my hometown understand their visitor data over a three year period.
This article contains three sections. First, I'll generate a list of vists by ZIP Codes using Pivot Tables.
Second, I'll describe how I used a free dataset and Python to convert the ZIP Codes into geographic coordinates.
Finally, I'll upload the data to Kepler.gl.
Here's what the final data looked like:
1. Creating a pivot table of Zip Codes
First, I previewed my data using Google Sheets:
Then, I used pivot tables to display the count of visitors by ZIP Code. See the result (left) and settings (right) below. You can access this view by highlighting your data range and clicking "Insert > Pivot Tables".
2. Converting the Zip Codes to latitudes and longitudes
Since Kepler.gl requires coordinates to plot data, I had to find a way to convert ZIP Codes into latitudes and longitudes (As an aside, ZIP Codes aren't always the best datatype for business purposes. See this post for more.)
There are a number of ways to convert ZIP Codes to coordinates. One free way is to download this dataset from Kaggle, and use a VLOOKUP style operation in Pandas. That's exactly what I did. Below is the code that I used. You will need to have pandas and geopy installed prior to using this code.
import pandas as pd
import geopy
import math
# replace this with your input csv
df = pd.read_csv('20-21/20-21pv.csv')
# this is the name of the column in your input CSV that contains the ZIP Codes
df['Zip Code'] = df['Zip Code'].astype('int')
# replace this with the URL to the Kaggle csv
zipdf = pd.read_csv('zip_lat_long.csv')
def geocode_lat(row):
if math.isnan(row['Zip Code']):
return None
searchZip = int(row['Zip Code'])
tempRow = zipdf.loc[zipdf.ZIP == searchZip, "LAT"]
if tempRow.empty:
return None
else:
return tempRow.values[0]
def geocode_long(row):
if math.isnan(row['Zip Code']):
return None
searchZip = int(row['Zip Code'])
tempRow = zipdf.loc[zipdf.ZIP == searchZip, "LNG"]
if tempRow.empty:
return None
else:
return tempRow.values[0]
df['lat'] = df.apply(lambda row: geocode_lat(row), axis=1)
df['long'] = df.apply(lambda row: geocode_long(row), axis=1)
# replace this with the name of your output file
df.to_csv('20-21pv_with_latlong.csv')
and here's the result (see the coordinate columns):
Note 1: I did some mild data cleaning between step 1 and 2 that involved removing invalid ZIP Codes from the dataset. This included foreign ZIP Codes, and improperly formatted US ZIP Codes.
Note 2: The free dataset I used only contains ZIP Codes up to 2010. The US Postal Service has added a small number since then that weren't included in my final visualization.
3. Plotting visits in Kepler.gl
The final step is the easiest. With our .csv, I simply uploaded it into Kepler.gl.
The user interface should be self-explanatory, but there's a helpful guide if you get stuck.
Posted on April 3, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.