[Beginner's guide] Running aggregations with Elasticsearch and Kibana
Lisa Jung
Posted on August 5, 2021
There are two main ways to search in Elasticsearch:
1) Queries
retrieve documents that match the specified criteria.
2) Aggregations
present the summary of your data as metrics, statistics, and other analytics.
In my previous blog, we learned how to retrieve documents by sending queries
.
This blog will cover how you can summarize your data as metrics, statistics, or other analytics by sending aggregations
requests!
By the end of this blog, you will be able to run:
- metric aggregations
- bucket aggregations
- combined aggregations
Prerequisite work
Watch this video from time stamp 15:00-21:46. This video will show you how to complete steps 1-3.
- Set up Elasticsearch and Kibana*
- Add e-commerce dataset to Elasticsearch*
- Open the Kibana console(AKA Dev Tools)
- Keep two windows open side by side(this blog and the Kibana console)
We will be sending aggregations
requests from Kibana to Elasticsearch to learn how aggregations
work!
Notes
1) If you would rather download Elasticsearch and Kibana on your own machine, follow the steps outlined in Downloading Elasticsearch and Kibana(macOS/Linux and Windows).
2) The video will show you how to add the news headline dataset. Follow the same steps but add the e-commerce dataset linked above. Create an index called ecommerce_original
and add the data to that index.
Additional Resources
Interested in beginner friendly workshops on Elasticsearch and Kibana? Check out my Beginner's Crash Course to Elastic Stack series!
This blog is a complementary blog to Part 4 of the Beginner's Crash Course to Elastic Stack. If you prefer learning by watching videos instead, check out the recording!
2) Beginner's Crash Course to Elastic Stack Table of Contents
This table of contents includes repos of all workshops in the series. Each repo includes resources shared during the workshop including the video recording, presentation, related blogs, Elasticsearch requests and more!
Set up data within Elasticsearch
Now that you have completed the prerequisite steps, it is time to set up data within Elasticsearch.
Often times, the dataset is not optimal for running requests in its original state.
For example, the type of a field may not be recognized by Elasticsearch or the dataset may contain a value that was accidentally included in the wrong field and etc.
These are exact problems that I ran into while working with this dataset. The following are the requests that I sent to Elasticsearch to yield the results included in the blog.
Copy and paste these requests into the Kibana console(Dev Tools) and run these requests in the order shown below.
STEP 1: Create a new index(ecommerce_data) with the following mapping.
PUT ecommerce_data
{
"mappings": {
"properties": {
"Country": {
"type": "keyword"
},
"CustomerID": {
"type": "long"
},
"Description": {
"type": "text"
},
"InvoiceDate": {
"type": "date",
"format": "M/d/yyyy H:m"
},
"InvoiceNo": {
"type": "keyword"
},
"Quantity": {
"type": "long"
},
"StockCode": {
"type": "keyword"
},
"UnitPrice": {
"type": "double"
}
}
}
}
STEP 2: Reindex the data from the original index(source) to the one you just created(destination).
POST _reindex
{
"source": {
"index": "name of your original index when you added the data to Elasticsearch"
},
"dest": {
"index": "ecommerce_data"
}
}
STEP 3: Remove the negative values from the field "UnitPrice".
When you explore the minimum unit price in this dataset, you will see that the minimum unit price value is -11062.06.
To keep our data simple, I used the delete_by_query API to remove all unit prices less than 0.
POST ecommerce_data/_delete_by_query
{
"query": {
"range": {
"UnitPrice": {
"lte": 0
}
}
}
}
STEP 4: Remove values greater than 500 from the field "UnitPrice".
When you explore the maximum unit price in this dataset, you will see that the maximum unit price value is 38,970.
When the data is manually examined, the majority of the unit prices are less than 500. The max value of 38,970 would skew the average.
To simplify our demo, I used the delete_by_query API to remove all unit prices greater than 500.
POST ecommerce_data/_delete_by_query
{
"query": {
"range": {
"UnitPrice": {
"gte": 500
}
}
}
}
Get information about documents in an index
Before we can run aggregations
requests, we need to know what information is included in our dataset.
This will help us figure out what type of questions we could ask and identify the appropriate fields to run aggregations
on to get the answers.
The following query
will retrieve information about documents in the ecommerce_data
index. This query
is a great way to explore the structure and content of your document.
Syntax:
GET Enter_name_of_the_index_here/_search
Example:
GET ecommerce_data/_search
Copy and paste the example into the Kibana console and send the request!
Expected response from Elasticsearch:
Elasticsearch displays a number of hits(line 12) and a sample of 10 search results by default(lines 16+).
The first hit(a document) is shown on lines 17-31. The field "_source"(line 22) lists all the fields(the content) of the document.
The ecommerce_data
index contains transaction data from a company that operates in multiple countries.
Each document is a transaction of an item and it contains the following fields:
- Description
- Quantity
- InvoiceNo
- CustomerID
- UnitPrice
- Country
- InvoiceDate
- StockCode
As you can see, running this query
helps us understand what types of questions we could ask about our dataset and which fields we need to aggregate
on to get the answers.
Aggregations Requests
The basic syntax of an aggregations
request looks like the following.
Syntax:
GET Enter_name_of_the_index_here/_search
{
"aggs": {
"Name your aggregations here": {
"Specify the aggregation type here": {
"field": "Name the field you want to aggregate on here"
}
}
}
}
There are various types of aggregations
that you can run with Elasticsearch.
In the prerequisite steps, we added the e-commerce dataset and set up data within Elasticsearch.
To learn how these different types of aggregations
work, we will pretend that we own an e-commerce app and that we have added our transaction data to the ecommerce_data
index.
We will be sending various aggregations
requests to get insights about transactions of items sold on our e-commerce app!
Metric Aggregations
Metric aggregations
are used to compute numeric values based on your dataset. It can be used to calculate the values of sum
,min
, max
, avg
, unique count(cardinality
) and etc.
When you are running an e-commerce app, it is important to know how your business is performing. A great way to measure that is to compute these metrics we mentioned above.
Let's calculate these values!
Metric aggregations
can only be performed on fields that contain numeric values.
Take a look at the image below. This is an example of a document in our index(lines 22-31).
The fields "Quantity" and "UnitPrice" contain numeric values. Metric aggregations
can be performed on these fields.
Compute the sum of all unit prices in the index
Let's say we want to sum
up the values of the field "UnitPrice" over all documents in the ecommerce_data
index.
Syntax:
GET Enter_name_of_the_index_here/_search
{
"aggs": {
"Name your aggregations here": {
"sum": {
"field": "Name the field you want to aggregate on here"
}
}
}
}
The following example follows the aggregations
syntax.
Example:
GET ecommerce_data/_search
{
"aggs": {
"sum_unit_price": {
"sum": {
"field": "UnitPrice"
}
}
}
}
This aggregations
request is named as "sum_unit_price". It instructs Elasticsearch to perform sum
aggregations on the field "UnitPrice" over all documents in the ecommerce_data
index
Copy and paste the example into the Kibana console and send the request!
Expected response from Elasticsearch:
By default, Elasticsearch returns top 10 hits(lines 16+).
When you minimize hits(red box- line 10), you will see the results of aggregations
we named "sum_unit_price"(image below). It displays the sum
of all unit prices present in our index.
If the sole purpose of running aggregations
requests is to get the aggregations
results, you can add a size
parameter and set it to 0 as shown below.
This parameter prevents Elasticsearch from fetching the top 10 hits so that the aggregations results are shown at the top of the response.
Using a size parameter
Example:
GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"sum_unit_price": {
"sum": {
"field": "UnitPrice"
}
}
}
}
Copy and paste the example into the Kibana console and send the request!
Expected response from Elasticsearch:
We no longer need to minimize the hits to get access to the aggregations
results!
We will be setting the size
parameter to 0 in all requests from this point on.
Compute the lowest(min) unit price of an item
What if we wanted to calculate the lowest(min
) unit price of an item?
Syntax:
GET Enter_name_of_the_index_here/_search
{
"size": 0,
"aggs": {
"Name your aggregations here": {
"min": {
"field": "Name the field you want to aggregate on here"
}
}
}
}
The following example is very similar to the last aggregations
request we sent.
Example:
GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"lowest_unit_price": {
"min": {
"field": "UnitPrice"
}
}
}
}
The differences are that:
- we are naming the
aggregations
as "lowest_unit_price". - we set the
aggregations
type tomin
which is short for minimum.
Copy and paste the example into the Kibana console and send the request!
Expected response from Elasticsearch:
The lowest unit price of an item is 1.01.
Compute the highest(max) unit price of an item
What if we were interested in the highest(max
) unit price of an item?
Syntax:
GET Enter_name_of_the_index_here/_search
{
"size": 0,
"aggs": {
"Name your aggregations here": {
"max": {
"field": "Name the field you want to aggregate on here"
}
}
}
}
The following example is very similar to the last aggregations
request we sent.
Example:
GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"highest_unit_price": {
"max": {
"field": "UnitPrice"
}
}
}
}
The differences are that:
- we are naming the
aggregations
as "highest_unit_price". - we set the
aggregations
type tomax
which is short for maximum.
Copy and paste the example into the Kibana console and send the request!
Expected response from Elasticsearch:
The highest unit price of an item is 498.79.
Compute the average unit price of items
What if we wanted to calculate the average
unit price of items?
Syntax:
GET Enter_name_of_the_index_here/_search
{
"size": 0,
"aggs": {
"Name your aggregations here": {
"avg": {
"field": "Name the field you want to aggregate on here"
}
}
}
}
The following example is very similar to the last aggregations
request we sent.
Example:
GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"average_unit_price": {
"avg": {
"field": "UnitPrice"
}
}
}
}
The differences are that:
- we are naming the
aggregations
as "average_unit_price". - we set the
aggregations
type to type toavg
which is short for average.
Copy and paste the example into the Kibana console and send the request!
Expected response from Elasticsearch:
The average unit price of an item is ~4.39.
Compute the count, min, max, avg, sum in one go
Calculating the count
, min
, max
, avg
, and sum
individually can be a tedious task. The stats aggregations
can calculate all of these in one go!
Syntax:
GET Enter_name_of_the_index_here/_search
{
"size": 0,
"aggs": {
"Name your aggregations here": {
"stats": {
"field": "Name the field you want to aggregate on here"
}
}
}
}
The following example is very similar to the last aggregations
request we sent.
Example:
GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"all_stats_unit_price": {
"stats": {
"field": "UnitPrice"
}
}
}
}
The differences are that:
- we are naming the
aggregations
as "all_stats_unit_price". - we set the
aggregations
type tostats
which is short for statistics.
Let's copy and paste this example into the Kibana console and send the request!
Expected Response from Elasticsearch:
The stats aggregation
will yield the values of count
(the number of unit prices aggregation was performed on), min
, max
, avg
, and sum
(sum of all unit prices in the index).
Cardinality Aggregation
What if you want the number of unique customers whom have bought from the app?
You would run the cardinality aggregation
, which computes the count of unique values for a given field!
Syntax:
GET Enter_name_of_the_index_here
{
"size": 0,
"aggs": {
"Name your aggregations here": {
"cardinality": {
"field": "Name the field you want to aggregate on here"
}
}
}
}
The following example is very similar to the last aggregations
request we sent.
Example:
GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"number_unique_customers": {
"cardinality": {
"field": "CustomerID"
}
}
}
}
The differences are that:
- we are naming the
aggregations
as "number_unique_customers". - we set the
aggregations
type tocardinality
.
We perform cardinality aggregations
on the field "CustomerID" as each customer is given a unique customer ID. By identifying unique customer IDs, we are able to get the number of unique customers in our transaction data.
Copy and paste the example into the Kibana console and send the request!
Expected response from Elasticsearch:
Approximately, there are 4325 unique number of customers in our dataset.
Limiting the scope of an aggregation
In previous examples, aggregations
were performed on all documents in the ecommerce_data
index.
What if you want to run aggregations
on a subset of the documents?
For example, our index contains e-commerce data from multiple countries. Let's say you want to calculate the average unit price of items sold in Germany.
To limit the scope of the aggregations
, you can add a query
clause to the aggregations
request.
The query
clause defines the subset of documents that aggregations
should be performed on.
The syntax of combined query
and aggregations
request look like the following.
Syntax:
GET Enter_name_of_the_index_here/_search
{
"size": 0,
"query": {
"Enter match or match_phrase here": {
"Enter the name of the field": "Enter the value you are looking for"
}
},
"aggregations": {
"Name your aggregations here": {
"Specify aggregations type here": {
"field": "Name the field you want to aggregate here"
}
}
}
}
Let's take a look at the following example which calculates the average unit price of items sold in Germany.
This request instructs Elasticsearch to query
(1) all documents that "match"(2) the value "Germany" in the field "Country"(3).
Elasticsearch is then instructed to run aggregations
(4) on the queried data. We name the aggregations
"germany_average_unit_price"(5). Then, we tell Elasticsearch to get the average(6) of the values in the field "UnitPrice"(7) over all queried documents.
This in turn will tell us the average unit price of items sold in Germany.
Let's copy and paste the following example into the Kibana console and send the request!
Example:
GET ecommerce_data/_search
{
"size": 0,
"query": {
"match": {
"Country": "Germany"
}
},
"aggs": {
"germany_average_unit_price": {
"avg": {
"field": "UnitPrice"
}
}
}
}
Expected response from Elasticsearch:
The average of unit price of items sold in Germany is ~4.58.
The combination of query
and aggregations
request allowed us to perform aggregations
on a subset of documents.
What if we wanted to perform aggregations
on several subsets of documents?
This is where bucket aggregations
come into play!
Bucket Aggregations
When you want to aggregate
on several subsets of documents, bucket aggregations
will come in handy.
Bucket aggregations
group documents into several subsets of documents called buckets. All documents in a bucket share a common criteria.
The following diagram illustrates a bucket aggregations
request that splits documents into monthly buckets.
There are various ways you can group documents into buckets. These are:
- Date_histogram aggregation
- Histogram aggregation
- Range aggregation
- Terms aggregation
Date_histogram Aggregation
When you are looking to group data by time interval, the date_histogram aggregation
will prove very useful!
Our ecommerce_data
index contains transaction data that has been collected over time(from the year 2010 to 2011).
If we are looking to get insights about transactions over time, our first instinct should be to run the date_histogram aggregation
.
There are two ways to define a time interval with date_histogram aggregation
. These are Fixed_interval
and Calendar_interval
.
Fixed_interval
With the fixed_interval
, the interval is always constant.
Let's say we wanted to create a bucket for every 8 hour interval.
Syntax:
GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"Name your aggregations here": {
"date_histogram": {
"field":"Name the field you want to aggregate on here",
"fixed_interval": "Specify the interval here"
}
}
}
}
The following example is similar to the previous aggregations
request we have sent.
Example:
GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"transactions_by_8_hrs": {
"date_histogram": {
"field": "InvoiceDate",
"fixed_interval": "8h"
}
}
}
}
We name our aggregations
"transactions_by_8_hrs". Then, we set the type of aggregation
to "date_histogram".
We instruct Elasticsearch to perform this aggregation
on the field "InvoiceDate" and to split the documents into buckets at a "fixed_ interval" of 8 hours.
Let's copy and paste this example into the Kibana console and send the request!
Expected response from Elasticsearch:
Elasticsearch creates a bucket for every 8 hours("key_as_string") and shows the number of documents("doc_count") grouped into each bucket.
Another way we can define the time interval is through the calendar_interval
.
Calendar_interval
With the calendar_interval
, the interval may vary.
For example, we could choose a time interval of day, month or year. But daylight savings can change the length of specific days, months can have different number of days, and leap seconds can be tacked onto a particular year.
So the time interval of day, month, or leap seconds could vary!
A scenario where you might use the calendar_interval
is when you want to calculate the monthly revenue.
Syntax:
GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"Name your aggregations here": {
"date_histogram": {
"field":"Name the field you want to aggregate on here",
"calendar_interval": "Specify the interval here"
}
}
}
}
The following example is similar to the previous aggregations
request we have sent.
Example:
GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"transactions_by_month": {
"date_histogram": {
"field": "InvoiceDate",
"calendar_interval": "1M"
}
}
}
}
However, we name the aggregations
"transactions_by_month". Then, we set the type of aggregation to "date_histogram".
We instruct Elasticsearch to perform date_histogram aggregation
on the field "InvoiceDate" and to split the documents into buckets at a "calendar_interval" of 1 month.
Copy and paste the example into the Kibana console and send the request!
Expected response from Elasticsearch:
Elasticsearch creates monthly buckets. Within each bucket, the starting date and time of each monthly bucket is included in the field "key_as_string".
The field "key" shows the date and time represented as a timestamp.
The field "doc_count" shows the number of documents that fall within the time interval.
Bucket sorting for date_histogram aggregation
Take a look at the response above. You will notice that these buckets were sorted in ascending order of dates.
The field "key" shows the date and time represented as timestamps.
By default, the date_histogram aggregation
sorts buckets based on the "key"
values in ascending order.
To reverse this order, you can add an order
parameter to the aggregations
as shown below. Then, specify that you want to sort buckets based on the "_key" values in descending(desc) order.
Example:
GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"transactions_by_month": {
"date_histogram": {
"field": "InvoiceDate",
"calendar_interval": "1M",
"order": {
"_key": "desc"
}
}
}
}
}
Expected response from Elasticsearch:
You will see that buckets are now sorted to return the most recent interval first.
Histogram Aggregation
With the date_histogram aggregation
, we were able to create buckets based on time intervals.
With the histogram aggregation
, we can create buckets based on any numerical interval.
For example, let's say we wanted to create buckets based on price interval that increases in increments of 10.
Syntax:
GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"Name your aggregations here": {
"histogram": {
"field":"Name the field you want to aggregate on here",
"interval": Specify the interval here
}
}
}
}
The following example is similar to the last request.
Example:
GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"transactions_per_price_interval": {
"histogram": {
"field": "UnitPrice",
"interval": 10
}
}
}
}
The differences are that we are naming the aggregations
"transactions_per_price_interval".
We instruct Elasticsearch to run a histogram aggregation
on the field "UnitPrice" and configure the price interval to increase in increments of 10.
Copy and paste the example into the Kibana console and send the request!
Expected response from Elasticsearch:
Elasticsearch returns an array of buckets where each bucket represents a price interval("key").
Each interval increases in increments of 10 in unit price. It also includes the number of documents placed in each bucket("doc_count").
In the first price interval, there are more than 400,000 transactions for items priced within this interval.
In the next price interval, there are over 20,000 transactions.
It seems like the higher we go up in price interval, the number of transactions decreases. Could this be a pattern?!
This might be something worth exploring if we are looking to improve our sales strategy.
Bucket sorting for histogram aggregation
Similar to the date_histogram aggregation
, the histogram aggregation
sorts the buckets based on the "key" values as well.
By default, the histogram aggregation
sorts buckets based on the "key" values in ascending order.
But what if we wanted to sort by descending order?
Example:
GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"transactions_per_price_interval": {
"histogram": {
"field": "UnitPrice",
"interval": 10,
"order": {
"_key": "desc"
}
}
}
}
}
All you have to do is to add the order
parameter as shown above. Then, specify that you want to sort by "_key" values in descending("desc") order.
This way, the highest price interval is listed first!
Copy and paste the example into the console and send the request!
Expected response from Elasticsearch:
You will see that the buckets are now sorted to return the price intervals in descending order.
Range Aggregation
The range aggregation
is similar to the histogram aggregation
in that it can create buckets based on any numerical interval.
The difference is that the range aggregation
allows you to define intervals of varying sizes so you can customize it to your use case.
For example, what if you wanted to know the number of transactions for items from varying price ranges(between 0 and $50, between $50-$200, and between $200 and up)?
Syntax:
GET Enter_name_of_the_index_here/_search
{
"size": 0,
"aggs": {
"Name your aggregations here": {
"range": {
"field": "Name the field you want to aggregate on here",
"ranges": [
{
"to": x
},
{
"from": x,
"to": y
},
{
"from": z
}
]
}
}
}
}
The following example is similar to the last request.
Example:
GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"transactions_per_custom_price_ranges": {
"range": {
"field": "UnitPrice",
"ranges": [
{
"to": 50
},
{
"from": 50,
"to": 200
},
{
"from": 200
}
]
}
}
}
}
The differences are that we are naming the aggregations "transactions_per_custom_price_ranges".
We run the range aggregation
on the field "UnitPrice". Then, we provide the ranges below(0 to 50, 50 to 200, 200 and up).
Copy and paste the example into the Kibana console and send the request!
Expected response from Elasticsearch:
Elasticsearch returns an array of buckets where each bucket represents a customized price interval("key"). It also includes the number of documents("doc_count") placed in each bucket.
We see that over 400,000 transactions have occurred for items priced between 0 to 50.
855 transactions have occurred for items priced between 50-200.
307 transactions have occurred for items priced from 200 and up.
At this point you might be wondering if you can sort the range aggregation
.
Bucket sorting for range aggregation
The range aggregation
is sorted based on the input ranges you specify and it cannot be sorted any other way!
Terms Aggregation
The terms aggregation
creates a new bucket for every unique term it encounters for the specified field. It is often used to find the most frequently found terms in a specified field of a document.
Syntax:
GET Enter_name_of_the_index_here/_search
{
"aggs": {
"Name your aggregations here": {
"terms": {
"field": "Name the field you want to aggregate on here",
"size": State how many top results you want returned here
}
}
}
}
For example, let's say you want to identify 5 customers with the highest number of transactions(documents).
Each document is a transaction and each transaction includes a customer ID.
Therefore, if we find the 5 most frequently occurring customer IDs, we will find our top 5 customers.
To do this, we send the following example request.
Example:
GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"top_5_customers": {
"terms": {
"field": "CustomerID",
"size": 5
}
}
}
}
We name the aggregations
as "top_5_customers". We specify that we want to perform terms aggregation
on the field "CustomerID".
Since we only want the top 5 customers, we set the size
parameter within the terms aggregation
to 5.
Copy and paste the example into the Kibana console and send the request!
Expected response from Elasticsearch:
Elasticsearch will return 5 customer IDs("key") with the highest number of transactions("doc_count").
Bucket sorting for terms aggregation
By default, the terms aggregation
sorts buckets based on the "doc_count"
values in descending order.
But what if you want to sort the results in ascending order?
You would send the following request!
Example:
GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"5_customers_with_lowest_number_of_transactions": {
"terms": {
"field": "CustomerID",
"size": 5,
"order": {
"_count": "asc"
}
}
}
}
}
When we ask Elasticsearch to sort the buckets in ascending order, it will display customers IDs with the lowest number of documents. In other words, customers with the lowest number of transactions.
To account for that, we name our aggregations
"5_customers_with_lowest_number_of_transactions". Then, we instruct Elasticsearch to perform "terms" aggregations on the field "CustomerID" and that we want 5 buckets returned("size": 5).
Then, we add an "order" parameter to the "terms" aggregation and specify that we want to sort buckets based on the "_count" values in ascending("asc") order!
Copy and paste this example into the Kibana console and send the request!
Expected response from Elasticsearch:
You will see that the buckets are now sorted in ascending order of "doc_count", showing buckets with the lowest "doc_count" first.
Combined Aggregations
So far, we have ran metric aggregations
or bucket aggregations
to answer simple questions.
There will be times when we will ask more complex questions that require running combinations of these aggregations
.
For example, let's say we wanted to know the sum of revenue per day.
To get the answer, we need to first split our data into daily buckets(date_histogram aggregation
).
Within each bucket, we need to perform metric aggregations
to calculate the daily revenue.
The combined aggregations
request looks like the following.
Calculate the daily revenue
Let's break this down.
We let Elasticsearch know that we are sending an aggregations request(1).
In order to calculate daily revenue, we first need to split documents into daily buckets. So we name this aggregations
request "transactions_per_day"(2).
Since we are creating buckets based on time intervals, we run a date_histogram aggregation
(3) on the field "InvoiceDate"(4).
Then, we set the "calendar_interval" to a "day"(5).
Thus far, we have split the documents into daily buckets.
Within the "transactions_per_day" aggregations, we create a sub-aggregations
(6) called "daily_revenue"(7). This will calculate the total revenue generated each day.
You will notice this aggregation looks a little different.
It uses the "script"(9) to calculate the daily revenue. What is that all about?
Let’s take a look at the fields of our transaction data again.
When we look at our transaction document(lines 22-31), you will see that it lists the "Quantity"(line 24) of an item sold and the "UnitPrice"(line 27) of that item.
But we do not see the total revenue from this transaction. So to get the total revenue, we need to multiply the "Quantity" of item sold by its "UnitPrice".
This is where the script
(9) comes in.
Script
is used to dynamically create something in Elasticsearch.
In our context, it is used to dynamically calculate the total revenue per transaction.
In the "source" field(10), we instruct Elasticsearch that for each document(doc) in the daily bucket, get the value of the field "UnitPrice" and multiply that by the value of the field "Quantity".
That will calculate the total revenue of each transaction in our bucket. Then, we tell Elasticsearch to "sum"(8) up the total revenue from all the transactions in our bucket to calculate the daily_revenue(7).
Copy and paste the following example into the Kibana console and send the request!
Example:
GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"transactions_per_day": {
"date_histogram": {
"field": "InvoiceDate",
"calendar_interval": "day"
},
"aggs": {
"daily_revenue": {
"sum": {
"script": {
"source": "doc['UnitPrice'].value * doc['Quantity'].value"
}
}
}
}
}
}
}
Expected Response from Elasticsearch:
Elasticsearch returns an array of daily buckets.
Within each bucket, it shows the number of documents("doc_count") within each bucket as well as the revenue generated from each day("daily_revenue").
Calculating multiple metrics per bucket
You can also calculate multiple metrics per bucket.
For example, let's say you wanted to calculate the daily revenue and the number of unique customers per day in one go. To do this, you can add multiple metric aggregations
per bucket as shown below!
The request above is almost identical to the last request except that we added an additional metric aggregations
called "number_of_unique_customers_per_day"(2) to the sub-aggregations(1).
To calculate this value, we instruct Elasticsearch to perform cardinality aggregations
(3) on the field "CustomerID"(4).
Copy and paste the following example into the Kibana console and send the request!
Example:
GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"transactions_per_day": {
"date_histogram": {
"field": "InvoiceDate",
"calendar_interval": "day"
},
"aggs": {
"daily_revenue": {
"sum": {
"script": {
"source": "doc['UnitPrice'].value * doc['Quantity'].value"
}
}
},
"number_of_unique_customers_per_day": {
"cardinality": {
"field": "CustomerID"
}
}
}
}
}
}
Expected Response from Elasticsearch:
Elasticsearch returns an array of daily buckets.
Within each bucket, you will see that the "number_of_unique_customers_per_day" and the "daily_revenue" have been calculated for each day!
Example:
GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"transactions_per_day": {
"date_histogram": {
"field": "InvoiceDate",
"calendar_interval": "day"
},
"aggs": {
"daily_revenue": {
"sum": {
"script": {
"source": "doc['UnitPrice'].value * doc['Quantity'].value"
}
}
},
"number_of_unique_customers_per_day": {
"cardinality": {
"field": "CustomerID"
}
}
}
}
}
}
Sorting by metric value of a sub-aggregation
You do not always need to sort by time interval, numerical interval, or by doc_count! You can also sort by metric value of sub-aggregations
.
Let's take a look at the request below. Within the sub-aggregation
, metric values "daily_revenue"(3) and "number_of_unique_customers_per_day"(4) are calculated.
Let's say you wanted to find which day had the highest daily revenue to date!
All you need to do is to add the "order" parameter(1) and sort buckets based on the metric value of "daily_revenue" in descending("desc") order(2)!
Copy and paste the following example into the console and send the request!
Example:
GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"transactions_per_day": {
"date_histogram": {
"field": "InvoiceDate",
"calendar_interval": "day",
"order": {
"daily_revenue": "desc"
}
},
"aggs": {
"daily_revenue": {
"sum": {
"script": {
"source": "doc['UnitPrice'].value * doc['Quantity'].value"
}
}
},
"number_of_unique_customers_per_day": {
"cardinality": {
"field": "CustomerID"
}
}
}
}
}
}
Expected response from Elasticsearch:
You will see that the response is no longer sorted by date. The buckets are now sorted to return the highest daily revenue first!
Great job! You have mastered the basics of running aggregations
requests with Elasticsearch and Kibana.
Try to run various metric aggregations
, bucket aggregations
, and combined aggregations
on your own and see what type of insights you can find!
Posted on August 5, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.