Presenting data
Nechama Borisute
Posted on February 5, 2024
As a data scientist, I find myself not only finding the data required, but also needing to think about how to present the data in a helpful, meaningful fashion I.E. Data Visualization. Though these two fields are distinct, they are often dependent on each other, for if I cannot present or visualize the data I have obtained in a helpful way, it can mean that the data is incomplete or even incorrect.
One database company has a whitepaper written to advance their product, and this is an excerpt from the paper about time series data:
A line graph is the simplest way to represent time series data. It helps the viewer get a quick sense of how something has changed over time:
This statement is hard to deny, but the placeholder graph can be applied to a very broad range of data, not especially for time series data. That being the case, I feel there is actually a lack in the sense of how things changed since the graph is so generic.
Presenting data is a skill very distinct from isolating and compiling it from the raw data sources. Just as I make use of dedicated Python libraries for building data sets, I feel one should go to dedicated presentation resources when thinking about how to present data.
Here is a graph, modeled after one of the famous charts presented by the Statistician Hans Rosling in his TED talks:
The format Rosling chose here manages to let you "get a sense" of a very large dataset very quickly and in an unprecedentedly dramatic fashion.
One popular product for data visualization is Seaborn, a python library. Dedicated products enable me to quickly try out different views and formats for my data and this enables quick feedback to see if I am on the right track. One advantage of Seaborn is that its code is very succinct, allowing you to create plots and grids in just a few lines of code, sometimes just one will suffice.
Here are a few plots made with Seaborn:
# import seaborn library into memory
import seaborn as sns
# load in a built-in dataset from seaborn to use for plots
df = sns.load_dataset('iris')
# check dataset to see column names
df.info()
# plot a histogram of sepal length according to species
sns.histplot(data=df, x="sepal_length", hue="species", element="step");
# plot a pairplot for all features
sns.pairplot(df);
# plot a scatterplot for petal information
sns.scatterplot(x = 'petal_width', y= 'petal_length', hue = 'species', data = df);
# plot a barplot of sepal information on x and y
sns.barplot(data= df, x = 'sepal_width', y='sepal_length')
These are but a few examples of the awesome things Seaborn can help you accomplish. To learn about Seaborn, check out the docs.
Posted on February 5, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.