The Strength of Vertica DB: A Case Study in Sales Analytics
Dmitry Romanoff
Posted on October 19, 2024
In today’s data-driven world, organizations need robust database solutions that can handle vast amounts of information while delivering quick insights. Vertica DB stands out as an exemplary platform designed for high-performance analytics. This article showcases its capabilities through a case study involving a simulated sales dataset.
Creating the Sales Table
To illustrate the power of Vertica, we begin by creating a sales table that captures essential transaction details. The structure includes fields for transaction ID, customer ID, product ID, quantity, price, transaction date, and location:
CREATE TABLE sales (
transaction_id INT,
customer_id INT,
product_id INT,
quantity INT,
price DECIMAL(10, 2),
transaction_date DATE,
location VARCHAR(50)
);
Generating a Massive Dataset
To fully leverage Vertica’s capabilities, we populate the sales table with a massive dataset of 50 million records. This is done using a series of SQL commands that utilize randomization to simulate realistic sales data:
INSERT INTO sales
SELECT
ROW_NUMBER() OVER () AS transaction_id,
(RANDOM() % 1000000) + 1 AS customer_id,
(RANDOM() % 10000) + 1 AS product_id,
(RANDOM() % 10) + 1 AS quantity,
ROUND(RANDOM() * 100, 2) AS price,
DATE '2023-01-01' + (RANDOM() % 365) AS transaction_date,
(ARRAY['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'])[RANDOM() % 5 + 1] AS location
FROM (
SELECT 1 AS dummy FROM
(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 0) AS a,
-- (continued with multiple unions to create a large dataset)
) AS numbers
LIMIT 50000000;
Once executed, we confirmed the insertion of 50 million records:
SELECT count(1) FROM sales;
The result indicates the successful creation of our dataset.
Analyzing Sales Data
Total Sales by Product
Using Vertica’s advanced analytical capabilities, we can quickly aggregate sales data to identify the top-selling products. The query below summarizes total sales for each product, returning results in a matter of milliseconds:
SELECT product_id, SUM(quantity * price) AS total_sales
FROM sales
GROUP BY product_id
ORDER BY total_sales DESC
LIMIT 10;
Result:
The top two products generate significant revenue, with total sales exceeding $1.87 billion for each.
Execution Time:
First fetch: 861.693 ms
Monthly Sales Trends
Understanding trends over time is crucial for businesses. The following query groups total sales by month, allowing us to see how sales fluctuate across the year:
SELECT DATE_TRUNC('month', transaction_date) AS month, SUM(quantity * price) AS total_sales
FROM sales
GROUP BY month
ORDER BY month;
Result:
The total sales for January 2023 amounted to over $3.74 billion.
Execution Time:
First fetch: 1150.676 ms
Running Total of Sales
For deeper insights, we can calculate a running total of sales over a specified range. This analysis helps businesses track performance trends over time:
SELECT transaction_date,
SUM(quantity * price) OVER (ORDER BY transaction_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS running_total
FROM sales;
Result:
This query returns a running total that enables the visualization of cumulative sales, which is invaluable for sales forecasting and performance analysis.
Conclusion
The results from this case study highlight the strengths of Vertica DB in handling large-scale data and performing complex analytics efficiently. With its ability to execute queries on vast datasets in seconds, Vertica emerges as a powerful tool for businesses seeking to harness the potential of their data.
Whether for tracking sales performance, identifying trends, or calculating cumulative metrics, Vertica’s performance is exemplary, making it a top choice for organizations focused on data analytics and business intelligence.
Posted on October 19, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.