From Penguins to the Puget Sound: Rapid Data Exploration using Observable Plot
Mike Freeman
Posted on June 22, 2021
Code is an expressive medium for data exploration, but can often be time-consuming and tedious. Having spent many years of my life creating visualizations using D3, even simple charts are still laborious to construct. The release of the new open-source Observable Plot library has dramatically changed the speed at which I — and anyone — can visually explore a dataset in a JavaScript environment. Plot is built on top of D3 (by the same creators of D3), and uses smart defaults to make visually encoding your data both expressive and concise:
Because data exploration is inherently exploratory in nature, rapid iteration and experimentation are key to uncovering important information in data. This tutorial uses Observable: a free JavaScript notebook environment that helps structure and document exploration by combining code and output into a single web-document. And with that, let’s dive in!
Putting Observable Plot to Work
The first dataset for this tutorial is the (now) canonical Penguins dataset, described in detail here. We can display the dataset in Observable by creating a Table:
After seeing the tabular representation, you can begin exploring the characteristics of the data visually. The syntax of Plot allows you to express how you want to represent, or visually encode, each observation in your data. I often move through the following data exploration steps:
1. Distribution of a single variable
To assess the distribution of a given variable (e.g., flipper length), you can represent each penguin as a dot, and map from the flipper length column to the x visual channel:
2. Correlation between variables
Once you see how a single variable is distributed, you can assess its correlation with another continuous variable -- for example, is flipper length related to body mass? Again, we choose how to map between the variables in the dataset and available visual channels (e.g., x and y):
3. Relationships by groups
After assessing the correlation between two variables, you can dig a bit deeper and ask, Is this relationship consistent within different categories of my data? For example, is the relationship between flipper length and body mass consistent for each species? As you’ve already used the x and y channels, the color encoding is a way to represent the third dimension of your data:
Alternatively, you can leverage a small multiples technique by breaking the plot out into different facets:
Managing Time-Series Data
At this point, let’s shift gears and talk specifically about time-series data. Below is a dataset of the precipitation and temperature in Seattle represented as a table.
Changes over time are commonly displayed as a line — here you can see the minimum and maximum daily temperatures over a four year period:
With the daily weather fluctuations, it can be a bit difficult to process temporal trends -- luckily, Plot facilitates computing moving averages within the plotting code! Connecting the time window parameter to an Observable input, we can quickly experiment with visual outputs:
While Plot is a nice abstraction for building visualizations, it still allows you to create bespoke visualizations, such as this calendar view of the weather!
The next step in the journey is yours. Please give Observable Plot a try and let us know your feedback so that we can continue to improve the experience for everyone.
Posted on June 22, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.