4 numeric distribution metrics to track in Snowflake (and how to track them)
Metaplane
Posted on March 2, 2024
Everything in your business runs smoothly—until it doesn’t. Out of nowhere, sales dip, website traffic plummets, or customer complaints shoot up. And scrambling to figure out what went wrong can feel like a game of business whack-a-mole.
Ultimately, when you react to problems as they pop up, you spend time and resources that would be better allocated elsewhere. If only you knew what issues were headed your way before they actually hit…
This is where tracking numeric data comes into play. By monitoring 4 key metrics in your Snowflake data warehouse, you can uncover hidden patterns, predict upcoming trends, and make decisions based on solid data, not just gut feelings.
In this guide, we’ll walk you through which numeric data metrics you need to be tracking and how to track them in Snowflake.
What is Numeric Data?
Numeric data is simply measurable information. It’s a fundamental component of statistics, often shown by its distribution (i.e. the shape or pattern of the data) in the form of a histogram.
While visual methods are great for human eyes to see and understand, sometimes we need to describe data patterns in terms of numbers. For instance, in the context of data observability, when we use machine learning to alert on anomalies within our numeric data.
For that, we rely on 4 standard metrics to summarize these distributions: minimum, maximum, mean, and standard deviation.
- Minimum (min): This is the smallest number in your data set.
- Maximum (max): This is the largest number in your data set.
- Mean: Often referred to as the average, this is calculated by adding all the numeric values together and dividing by the number of values. It gives a central value for the data.
- Standard deviation: This is the amount of variation or dispersion in a set of values. A low standard deviation means that the values tend to be close to the mean, while a high standard deviation indicates that the values are spread out over a wider range.
Now, we’ll take a deeper look at each metric, and using sample SQL queries, we’ll show you how to track them in Snowflake. For context, the sample queries below follow an e-commerce site. They assume that you have a 'sales' table that records your online store's daily revenue, as shown below.
This ‘sales’ table includes two columns: 'amount', which shows the daily earnings, and 'date', which indicates the day of these earnings.
Minimum
The minimum is the smallest value within a data set. So, if you have a data set representing daily sales, the minimum would be the lowest sales amount recorded on any given day.
While it’s often easy to overlook low sales figures when they’re overshadowed by high-performing days, the minimum can be a major indicator of your business's health during its quietest periods. Understanding the context of the lowest sales you have on any given day can reveal the underlying stability and consistency of your business.
Take this e-commerce example, for instance:
As the owner of an online store, you maintain a Snowflake dashboard with a graph showing the store's daily sales over the past six months. But over the past few months, you’ve noticed a concerning trend in this graph. While your peak sales days, typically around new product launches or holiday seasons, are doing well, there's a noticeable decline in the minimum daily sales.
This trend of decreasing minimum daily sales is a red flag. It could mean several things:
There's an issue with the website's user experience on off-peak days. This could be due to various factors such as poor website navigation, technical glitches, or less compelling content during these periods.
A new competitor has entered the market. A drop in the lowest sales figures could be a sign that customers are choosing to spend their money elsewhere. Tracking the minimum sales can be an early indicator of losing market share to competitors.
Customer preferences are changing. It might mean that your product range is no longer aligning with current market trends or customer interests. This decline could prompt a review of your product offerings and marketing strategies to ensure they align with evolving customer preferences.
There are issues with inventory. Customers unable to find what they need might turn to other sources, leading to reduced sales. You might need to reassess your inventory management to guarantee that popular products are always in stock.
Tracking this minimum daily sales figure gives you an early warning, so you can dig deeper into the root cause of the issue. Here’s how this command looks in Snowflake, assuming you have a sales table similar to the one above:
SELECT MIN(amount) AS MinDailySales
FROM sales;
Maximum
The maximum is the largest value within a data set. Juxtaposing the minimum definition above, if you have a data set representing daily sales, the maximum would be the highest sales amount recorded on any given day.
These high points can often be linked to specific actions you took, like a great marketing campaign or a popular product launch. So, tracking them means you can figure out what's working really well (and try to do it again). It also helps you with planning: making sure you have enough products in stock, enough people working, and that your website can handle a lot of visitors, all without spending too much when it's not necessary.
Let’s revisit our e-commerce example:
On the Snowflake dashboard that you've been using to track your daily sales over the past six months, there's also a graph showing the days when sales peaked. Seeing an upward trend in these peak sales days is usually a great sign. More often than not, it suggests that your recent marketing strategies are hitting the mark and there's a high demand for what you're selling.
But before you celebrate, verify that these numbers are accurate and not skewed by data recording errors, like incorrect transaction logging. You want to be certain that this upward trend is genuine.
Once you've verified the accuracy of your data and you're confident that these peak sales figures are, in fact, real, this success leads to a set of new questions and considerations. For instance, can your website handle the increased traffic on these busy days? You don't want the site crashing or slowing down just when customer interest is at its peak. Also, think about your inventory. Can you keep up with this higher demand without ending up with too much stock afterward?
At the end of the day, the benefit of tracking the maximum metric is twofold—you validate the effectiveness of your marketing and product strategies and ensure the robustness/reliability of your operations. Here’s how this command looks in Snowflake, assuming you have a sales table similar to the one above:
SELECT MAX(amount) AS MaxDailySales
FROM sales;
Mean
The mean is the average of all your data. While you might have days with exceptional sales due to a product launch and days with very low sales, the mean helps you understand your typical performance. It smooths out the extreme highs and lows, giving you a balanced view of your overall performance and business health.
This metric is especially important in planning and setting realistic expectations. If you're only focusing on the high points, you might overestimate your performance. And if you're only looking at the lows, you might underestimate it.
Now, back to our e-commerce example:
Right next to your min and max graphs in your Snowflake dashboard, you also have a graph of average daily sales. Like tracking your maximum, you want to see an upward trend. In this case, though, we’re looking for a steady rise.
A steady sales growth rate is a good indicator that overall business is growing, reflecting not just occasional spikes from big events but consistent growth across regular days as well. It suggests that your strategies—be it marketing, customer engagement, or inventory management—are resonating with your customers.
On the flip side, seeing a steady decline in sales would indicate that your strategies are not resonating with your customers. In this case, you’ll need to revisit all your strategies to find what’s not resonating, and pivot accordingly.
Either way, tracking your mean distribution over time helps identify gradual changes in business performance, ensuring that true growth (or decline) isn’t masked by high-variation days. Here’s how this command looks in Snowflake, assuming you have a sales table similar to the one above:
SELECT AVG(amount) AS AvgDailySales
FROM sales;
Standard deviation
The standard deviation is the amount of variation or dispersion in a set of values. It’s a measure of how consistent or inconsistent your numbers are. A small standard deviation means your sales numbers are pretty close to each other most of the time—which is good because it tends to make your business more predictable.
A large standard deviation, on the other hand, means your data is highly inconsistent, and it’s usually a sign that there are factors affecting your business that you need to understand.
Let’s go back to our e-commerce example one last time:
Next to your min, max, and mean graphs in your Snowflake dashboard is also a graph of standard deviation. If you start noticing fluctuations in your sales patterns. To understand this better, you can focus on the standard deviation of daily sales.
For instance, for a big product launch, you’ll probably notice a huge spike in sales, resulting in a high standard deviation for that period. This might indicate that your marketing campaign for this launch was effective. But if that spike is followed by several days of significantly lower sales, you’ll see this pattern of high sales followed by low sales days will cause your standard deviation to increase.
If your goal is to smooth out these fluctuations, you might need to rethink your marketing approach to not just create a buzz around new launches but also to maintain steady sales afterward.
And by tracking the standard deviation in your Snowflake sales data, you can gauge how well your revised marketing approaches are working towards achieving more consistent sales, and adjust your strategies accordingly for better business stability and predictability. Here’s how this command looks in Snowflake, assuming you have a sales table similar to the one above:
SELECT STDDEV(amount) AS StdDevDailySales
FROM sales;
Useful conditional statements for numeric data functions in Snowflake
Now that we’ve covered the basics, you can also refine your functions using conditional statements. For instance, if you want to focus on a specific time frame, such as a particular month, you can use a WHERE clause. Here’s what that looks like in Snowflake.
SELECT AVG(amount) AS AvgDailySales
FROM sales
WHERE date >= '2024-01-01'
AND date < '2024-02-01';
Snowflake also supports window functions, which can be useful if you want to calculate these metrics over different partitions of your data (i.e. per month or per product category). Here's an example of how you could calculate the average daily sales per month:
SELECT
DATE_TRUNC('MONTH', date) AS Month,
AVG(amount) AS AvgMonthlySales
FROM sales
GROUP BY DATE_TRUNC('MONTH', date);
Putting it all together
For better SQL form, you can condense all these commands into a single SQL query in Snowflake. Here’s how this looks all together:
WITH MonthlySalesStats AS (
SELECT
DATE_TRUNC('MONTH', date) AS Month,
MIN(amount) OVER(PARTITION BY DATE_TRUNC('MONTH', date)) AS MinMonthlySales,
MAX(amount) OVER(PARTITION BY DATE_TRUNC('MONTH', date)) AS MaxMonthlySales,
AVG(amount) OVER(PARTITION BY DATE_TRUNC('MONTH', date)) AS AvgMonthlySales,
STDDEV(amount) OVER(PARTITION BY DATE_TRUNC('MONTH', date)) AS StdDevMonthlySales
FROM sales
WHERE date >= '2024-01-01' AND date < '2024-07-01' -- Specifying the time frame for the first half of 2024
)
SELECT DISTINCT Month,
MinMonthlySales,
MaxMonthlySales,
AvgMonthlySales,
StdDevMonthlySales
FROM MonthlySalesStats
ORDER BY Month;
In this query:
- The
DATE_TRUNC('MONTH', date)
function is used to group sales by month. - The
OVER(PARTITION BY DATE_TRUNC('MONTH', date))
clause with each aggregate function calculates the min, max, average, and standard deviation for each month within the specified date range. - The
WHERE
clause limits the data to the first half of 2024 (from January 1, 2024, to June 30, 2024). - The
SELECT DISTINCT
statement ensures that each month is listed only once along with its corresponding sales statistics.
This gives you a complete monthly breakdown of the minimum, maximum, average, and standard deviation of sales for each month in the specified time frame, so you can track sales performance trends over time.
Wrapping Up
In a perfect world, you could sense when there’s a hint of change in the air—especially when those sudden shifts could balloon into larger business problems. But while that sixth sense might not exist, tracking your numeric distribution metrics gives you a proactive leg up.
When you track the min, max, mean, and standard deviation in Snowflake, you can pick up on business-changing patterns and trends. That way, you can act fast, fix small issues before they become big headaches, and even capitalize on opportunities as they arise—all because you have the full picture of your data.
Want to track your numeric data in Snowflake tables and views within minutes, then be alerted on anomalies with machine learning that accounts for trends and seasonality? Get started with Metaplane for free or book a demo to learn more.
Posted on March 2, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.