Presenting data

nechamaborisute

Nechama Borisute

Posted on February 5, 2024

Presenting data

As a data scientist, I find myself not only finding the data required, but also needing to think about how to present the data in a helpful, meaningful fashion I.E. Data Visualization. Though these two fields are distinct, they are often dependent on each other, for if I cannot present or visualize the data I have obtained in a helpful way, it can mean that the data is incomplete or even incorrect.

One database company has a whitepaper written to advance their product, and this is an excerpt from the paper about time series data:

A line graph is the simplest way to represent time series data. It helps the viewer get a quick sense of how something has changed over time:

lineplot time series

This statement is hard to deny, but the placeholder graph can be applied to a very broad range of data, not especially for time series data. That being the case, I feel there is actually a lack in the sense of how things changed since the graph is so generic.

Presenting data is a skill very distinct from isolating and compiling it from the raw data sources. Just as I make use of dedicated Python libraries for building data sets, I feel one should go to dedicated presentation resources when thinking about how to present data.

Here is a graph, modeled after one of the famous charts presented by the Statistician Hans Rosling in his TED talks:

Time series image

The format Rosling chose here manages to let you "get a sense" of a very large dataset very quickly and in an unprecedentedly dramatic fashion.

One popular product for data visualization is Seaborn, a python library. Dedicated products enable me to quickly try out different views and formats for my data and this enables quick feedback to see if I am on the right track. One advantage of Seaborn is that its code is very succinct, allowing you to create plots and grids in just a few lines of code, sometimes just one will suffice.

Here are a few plots made with Seaborn:

# import seaborn library into memory
import seaborn as sns
# load in a built-in dataset from seaborn to use for plots
df = sns.load_dataset('iris') 
# check dataset to see column names
df.info()
Enter fullscreen mode Exit fullscreen mode
# plot a histogram of sepal length according to species
sns.histplot(data=df, x="sepal_length", hue="species", element="step");
Enter fullscreen mode Exit fullscreen mode

histogram

# plot a pairplot for all features
sns.pairplot(df);
Enter fullscreen mode Exit fullscreen mode

pairplot

# plot a scatterplot for petal information
sns.scatterplot(x = 'petal_width', y= 'petal_length', hue = 'species', data = df);
Enter fullscreen mode Exit fullscreen mode

scatterplot

# plot a barplot of sepal information on x and y
sns.barplot(data= df, x = 'sepal_width', y='sepal_length')
Enter fullscreen mode Exit fullscreen mode

barplot

These are but a few examples of the awesome things Seaborn can help you accomplish. To learn about Seaborn, check out the docs.

💖 💪 🙅 🚩
nechamaborisute
Nechama Borisute

Posted on February 5, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Presenting data
seaborn Presenting data

February 5, 2024