[Beginner's guide] Running aggregations with Elasticsearch and Kibana

lisahjung

Lisa Jung

Posted on August 5, 2021

[Beginner's guide] Running aggregations with Elasticsearch and Kibana

There are two main ways to search in Elasticsearch:
1) Queries retrieve documents that match the specified criteria.
2) Aggregations present the summary of your data as metrics, statistics, and other analytics.

image

image

In my previous blog, we learned how to retrieve documents by sending queries.

This blog will cover how you can summarize your data as metrics, statistics, or other analytics by sending aggregations requests!

By the end of this blog, you will be able to run:

  • metric aggregations
  • bucket aggregations
  • combined aggregations

Prerequisite work

Watch this video from time stamp 15:00-21:46. This video will show you how to complete steps 1-3.

  1. Set up Elasticsearch and Kibana*
  2. Add e-commerce dataset to Elasticsearch*
  3. Open the Kibana console(AKA Dev Tools)
  4. Keep two windows open side by side(this blog and the Kibana console)

We will be sending aggregations requests from Kibana to Elasticsearch to learn how aggregations work!

Notes
1) If you would rather download Elasticsearch and Kibana on your own machine, follow the steps outlined in Downloading Elasticsearch and Kibana(macOS/Linux and Windows).

2) The video will show you how to add the news headline dataset. Follow the same steps but add the e-commerce dataset linked above. Create an index called ecommerce_original and add the data to that index.

Additional Resources

Interested in beginner friendly workshops on Elasticsearch and Kibana? Check out my Beginner's Crash Course to Elastic Stack series!

1) Part 4 Workshop Recording

This blog is a complementary blog to Part 4 of the Beginner's Crash Course to Elastic Stack. If you prefer learning by watching videos instead, check out the recording!

2) Beginner's Crash Course to Elastic Stack Table of Contents

This table of contents includes repos of all workshops in the series. Each repo includes resources shared during the workshop including the video recording, presentation, related blogs, Elasticsearch requests and more!

Set up data within Elasticsearch

Now that you have completed the prerequisite steps, it is time to set up data within Elasticsearch.

Often times, the dataset is not optimal for running requests in its original state.

For example, the type of a field may not be recognized by Elasticsearch or the dataset may contain a value that was accidentally included in the wrong field and etc.

These are exact problems that I ran into while working with this dataset. The following are the requests that I sent to Elasticsearch to yield the results included in the blog.

Copy and paste these requests into the Kibana console(Dev Tools) and run these requests in the order shown below.

STEP 1: Create a new index(ecommerce_data) with the following mapping.



PUT ecommerce_data
{
  "mappings": {
    "properties": {
      "Country": {
        "type": "keyword"
      },
      "CustomerID": {
        "type": "long"
      },
      "Description": {
        "type": "text"
      },
      "InvoiceDate": {
        "type": "date",
        "format": "M/d/yyyy H:m"
      },
      "InvoiceNo": {
        "type": "keyword"
      },
      "Quantity": {
        "type": "long"
      },
      "StockCode": {
        "type": "keyword"
      },
      "UnitPrice": {
        "type": "double"
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

STEP 2: Reindex the data from the original index(source) to the one you just created(destination).



POST _reindex
{
  "source": {
    "index": "name of your original index when you added the data to Elasticsearch"
  },
  "dest": {
    "index": "ecommerce_data"
  }
}


Enter fullscreen mode Exit fullscreen mode

STEP 3: Remove the negative values from the field "UnitPrice".
When you explore the minimum unit price in this dataset, you will see that the minimum unit price value is -11062.06.

To keep our data simple, I used the delete_by_query API to remove all unit prices less than 0.



POST ecommerce_data/_delete_by_query
{
  "query": {
    "range": {
      "UnitPrice": {
        "lte": 0
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

STEP 4: Remove values greater than 500 from the field "UnitPrice".
When you explore the maximum unit price in this dataset, you will see that the maximum unit price value is 38,970.

When the data is manually examined, the majority of the unit prices are less than 500. The max value of 38,970 would skew the average.

To simplify our demo, I used the delete_by_query API to remove all unit prices greater than 500.



POST ecommerce_data/_delete_by_query
{
  "query": {
    "range": {
      "UnitPrice": {
        "gte": 500
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

Get information about documents in an index

Before we can run aggregations requests, we need to know what information is included in our dataset.

This will help us figure out what type of questions we could ask and identify the appropriate fields to run aggregations on to get the answers.

The following query will retrieve information about documents in the ecommerce_data index. This query is a great way to explore the structure and content of your document.

Syntax:



GET Enter_name_of_the_index_here/_search


Enter fullscreen mode Exit fullscreen mode

Example:



GET ecommerce_data/_search


Enter fullscreen mode Exit fullscreen mode

Copy and paste the example into the Kibana console and send the request!

Expected response from Elasticsearch:
Elasticsearch displays a number of hits(line 12) and a sample of 10 search results by default(lines 16+).

The first hit(a document) is shown on lines 17-31. The field "_source"(line 22) lists all the fields(the content) of the document.

image

The ecommerce_data index contains transaction data from a company that operates in multiple countries.

Each document is a transaction of an item and it contains the following fields:

  • Description
  • Quantity
  • InvoiceNo
  • CustomerID
  • UnitPrice
  • Country
  • InvoiceDate
  • StockCode

As you can see, running this query helps us understand what types of questions we could ask about our dataset and which fields we need to aggregate on to get the answers.

Aggregations Requests

The basic syntax of an aggregations request looks like the following.

Syntax:



GET Enter_name_of_the_index_here/_search
{
  "aggs": {
    "Name your aggregations here": {
      "Specify the aggregation type here": {
        "field": "Name the field you want to aggregate on here"
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

There are various types of aggregations that you can run with Elasticsearch.

In the prerequisite steps, we added the e-commerce dataset and set up data within Elasticsearch.

To learn how these different types of aggregations work, we will pretend that we own an e-commerce app and that we have added our transaction data to the ecommerce_data index.

We will be sending various aggregations requests to get insights about transactions of items sold on our e-commerce app!

Metric Aggregations

Metric aggregations are used to compute numeric values based on your dataset. It can be used to calculate the values of sum,min, max, avg, unique count(cardinality) and etc.

When you are running an e-commerce app, it is important to know how your business is performing. A great way to measure that is to compute these metrics we mentioned above.

Let's calculate these values!

Metric aggregations can only be performed on fields that contain numeric values.

Take a look at the image below. This is an example of a document in our index(lines 22-31).

image

The fields "Quantity" and "UnitPrice" contain numeric values. Metric aggregations can be performed on these fields.

Compute the sum of all unit prices in the index

Let's say we want to sum up the values of the field "UnitPrice" over all documents in the ecommerce_data index.

Syntax:



GET Enter_name_of_the_index_here/_search
{
  "aggs": {
    "Name your aggregations here": {
      "sum": {
        "field": "Name the field you want to aggregate on here"
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

The following example follows the aggregations syntax.

Example:



GET ecommerce_data/_search
{
  "aggs": {
    "sum_unit_price": {
      "sum": {
        "field": "UnitPrice"
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

This aggregations request is named as "sum_unit_price". It instructs Elasticsearch to perform sum aggregations on the field "UnitPrice" over all documents in the ecommerce_data index

Copy and paste the example into the Kibana console and send the request!

Expected response from Elasticsearch:
By default, Elasticsearch returns top 10 hits(lines 16+).

image

When you minimize hits(red box- line 10), you will see the results of aggregations we named "sum_unit_price"(image below). It displays the sum of all unit prices present in our index.

image

If the sole purpose of running aggregations requests is to get the aggregations results, you can add a size parameter and set it to 0 as shown below.

This parameter prevents Elasticsearch from fetching the top 10 hits so that the aggregations results are shown at the top of the response.

Using a size parameter

Example:



GET ecommerce_data/_search
{
  "size": 0,
  "aggs": {
    "sum_unit_price": {
      "sum": {
        "field": "UnitPrice"
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

Copy and paste the example into the Kibana console and send the request!

Expected response from Elasticsearch:
We no longer need to minimize the hits to get access to the aggregations results!

image

We will be setting the size parameter to 0 in all requests from this point on.

Compute the lowest(min) unit price of an item

What if we wanted to calculate the lowest(min) unit price of an item?

Syntax:



GET Enter_name_of_the_index_here/_search
{
  "size": 0,
  "aggs": {
    "Name your aggregations here": {
      "min": {
        "field": "Name the field you want to aggregate on here"
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

The following example is very similar to the last aggregations request we sent.

Example:



GET ecommerce_data/_search
{
  "size": 0,
  "aggs": {
    "lowest_unit_price": {
      "min": {
        "field": "UnitPrice"
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

The differences are that:

  1. we are naming the aggregations as "lowest_unit_price".
  2. we set the aggregations type to min which is short for minimum.

Copy and paste the example into the Kibana console and send the request!

Expected response from Elasticsearch:
The lowest unit price of an item is 1.01.

image

Compute the highest(max) unit price of an item

What if we were interested in the highest(max) unit price of an item?

Syntax:



GET Enter_name_of_the_index_here/_search
{
  "size": 0,
  "aggs": {
    "Name your aggregations here": {
      "max": {
        "field": "Name the field you want to aggregate on here"
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

The following example is very similar to the last aggregations request we sent.

Example:



GET ecommerce_data/_search
{
  "size": 0,
  "aggs": {
    "highest_unit_price": {
      "max": {
        "field": "UnitPrice"
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

The differences are that:

  1. we are naming the aggregations as "highest_unit_price".
  2. we set the aggregations type to max which is short for maximum.

Copy and paste the example into the Kibana console and send the request!

Expected response from Elasticsearch:
The highest unit price of an item is 498.79.

image

Compute the average unit price of items

What if we wanted to calculate the average unit price of items?

Syntax:



GET Enter_name_of_the_index_here/_search
{
  "size": 0,
  "aggs": {
    "Name your aggregations here": {
      "avg": {
        "field": "Name the field you want to aggregate on here"
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

The following example is very similar to the last aggregations request we sent.

Example:



GET ecommerce_data/_search
{
  "size": 0,
  "aggs": {
    "average_unit_price": {
      "avg": {
        "field": "UnitPrice"
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

The differences are that:

  1. we are naming the aggregations as "average_unit_price".
  2. we set the aggregations type to type to avg which is short for average.

Copy and paste the example into the Kibana console and send the request!

Expected response from Elasticsearch:
The average unit price of an item is ~4.39.

image

Compute the count, min, max, avg, sum in one go

Calculating the count, min, max, avg, and sum individually can be a tedious task. The stats aggregations can calculate all of these in one go!

Syntax:



GET Enter_name_of_the_index_here/_search
{
  "size": 0,
  "aggs": {
    "Name your aggregations here": {
      "stats": {
        "field": "Name the field you want to aggregate on here"
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

The following example is very similar to the last aggregations request we sent.

Example:



GET ecommerce_data/_search
{
  "size": 0,
  "aggs": {
    "all_stats_unit_price": {
      "stats": {
        "field": "UnitPrice"
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

The differences are that:

  1. we are naming the aggregations as "all_stats_unit_price".
  2. we set the aggregations type to stats which is short for statistics.

Let's copy and paste this example into the Kibana console and send the request!

Expected Response from Elasticsearch:
The stats aggregation will yield the values of count(the number of unit prices aggregation was performed on), min, max, avg, and sum(sum of all unit prices in the index).

image

Cardinality Aggregation

What if you want the number of unique customers whom have bought from the app?

You would run the cardinality aggregation, which computes the count of unique values for a given field!

Syntax:



GET Enter_name_of_the_index_here
{
  "size": 0,
  "aggs": {
    "Name your aggregations here": {
      "cardinality": {
        "field": "Name the field you want to aggregate on here"
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

The following example is very similar to the last aggregations request we sent.

Example:



GET ecommerce_data/_search
{
  "size": 0,
  "aggs": {
    "number_unique_customers": {
      "cardinality": {
        "field": "CustomerID"
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

The differences are that:

  1. we are naming the aggregations as "number_unique_customers".
  2. we set the aggregations type to cardinality.

We perform cardinality aggregations on the field "CustomerID" as each customer is given a unique customer ID. By identifying unique customer IDs, we are able to get the number of unique customers in our transaction data.

Copy and paste the example into the Kibana console and send the request!

Expected response from Elasticsearch:
Approximately, there are 4325 unique number of customers in our dataset.

image

Limiting the scope of an aggregation

In previous examples, aggregations were performed on all documents in the ecommerce_data index.

What if you want to run aggregations on a subset of the documents?

For example, our index contains e-commerce data from multiple countries. Let's say you want to calculate the average unit price of items sold in Germany.

To limit the scope of the aggregations, you can add a query clause to the aggregations request.

The query clause defines the subset of documents that aggregations should be performed on.

The syntax of combined query and aggregations request look like the following.

Syntax:



GET Enter_name_of_the_index_here/_search
{
  "size": 0,
  "query": {
    "Enter match or match_phrase here": {
      "Enter the name of the field": "Enter the value you are looking for"
    }
  },
  "aggregations": {
    "Name your aggregations here": {
      "Specify aggregations type here": {
        "field": "Name the field you want to aggregate here"
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

Let's take a look at the following example which calculates the average unit price of items sold in Germany.

Example:
image

This request instructs Elasticsearch to query(1) all documents that "match"(2) the value "Germany" in the field "Country"(3).

Elasticsearch is then instructed to run aggregations(4) on the queried data. We name the aggregations "germany_average_unit_price"(5). Then, we tell Elasticsearch to get the average(6) of the values in the field "UnitPrice"(7) over all queried documents.

This in turn will tell us the average unit price of items sold in Germany.

Let's copy and paste the following example into the Kibana console and send the request!

Example:



GET ecommerce_data/_search
{
  "size": 0,
  "query": {
    "match": {
      "Country": "Germany"
    }
  },
  "aggs": {
    "germany_average_unit_price": {
      "avg": {
        "field": "UnitPrice"
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

Expected response from Elasticsearch:
The average of unit price of items sold in Germany is ~4.58.

image

The combination of query and aggregations request allowed us to perform aggregations on a subset of documents.

What if we wanted to perform aggregations on several subsets of documents?

This is where bucket aggregations come into play!

Bucket Aggregations

When you want to aggregate on several subsets of documents, bucket aggregations will come in handy.

Bucket aggregations group documents into several subsets of documents called buckets. All documents in a bucket share a common criteria.

The following diagram illustrates a bucket aggregations request that splits documents into monthly buckets.

image

There are various ways you can group documents into buckets. These are:

  1. Date_histogram aggregation
  2. Histogram aggregation
  3. Range aggregation
  4. Terms aggregation

Date_histogram Aggregation

When you are looking to group data by time interval, the date_histogram aggregation will prove very useful!

Our ecommerce_data index contains transaction data that has been collected over time(from the year 2010 to 2011).

If we are looking to get insights about transactions over time, our first instinct should be to run the date_histogram aggregation.

There are two ways to define a time interval with date_histogram aggregation. These are Fixed_interval and Calendar_interval.

Fixed_interval
With the fixed_interval, the interval is always constant.

Let's say we wanted to create a bucket for every 8 hour interval.

Syntax:



GET ecommerce_data/_search
{
  "size": 0,
  "aggs": {
    "Name your aggregations here": {
      "date_histogram": {
        "field":"Name the field you want to aggregate on here",
        "fixed_interval": "Specify the interval here"
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

The following example is similar to the previous aggregations request we have sent.

Example:



GET ecommerce_data/_search
{
  "size": 0,
  "aggs": {
    "transactions_by_8_hrs": {
      "date_histogram": {
        "field": "InvoiceDate",
        "fixed_interval": "8h"
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

We name our aggregations "transactions_by_8_hrs". Then, we set the type of aggregation to "date_histogram".

We instruct Elasticsearch to perform this aggregation on the field "InvoiceDate" and to split the documents into buckets at a "fixed_ interval" of 8 hours.

Let's copy and paste this example into the Kibana console and send the request!

Expected response from Elasticsearch:
Elasticsearch creates a bucket for every 8 hours("key_as_string") and shows the number of documents("doc_count") grouped into each bucket.

image

Another way we can define the time interval is through the calendar_interval.

Calendar_interval
With the calendar_interval, the interval may vary.

For example, we could choose a time interval of day, month or year. But daylight savings can change the length of specific days, months can have different number of days, and leap seconds can be tacked onto a particular year.

So the time interval of day, month, or leap seconds could vary!

A scenario where you might use the calendar_interval is when you want to calculate the monthly revenue.

Syntax:



GET ecommerce_data/_search
{
  "size": 0,
  "aggs": {
    "Name your aggregations here": {
      "date_histogram": {
        "field":"Name the field you want to aggregate on here",
        "calendar_interval": "Specify the interval here"
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

The following example is similar to the previous aggregations request we have sent.

Example:



GET ecommerce_data/_search
{
  "size": 0,
  "aggs": {
    "transactions_by_month": {
      "date_histogram": {
        "field": "InvoiceDate",
        "calendar_interval": "1M"
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

However, we name the aggregations "transactions_by_month". Then, we set the type of aggregation to "date_histogram".

We instruct Elasticsearch to perform date_histogram aggregation on the field "InvoiceDate" and to split the documents into buckets at a "calendar_interval" of 1 month.

Copy and paste the example into the Kibana console and send the request!

Expected response from Elasticsearch:
Elasticsearch creates monthly buckets. Within each bucket, the starting date and time of each monthly bucket is included in the field "key_as_string".

The field "key" shows the date and time represented as a timestamp.

The field "doc_count" shows the number of documents that fall within the time interval.

image

Bucket sorting for date_histogram aggregation
Take a look at the response above. You will notice that these buckets were sorted in ascending order of dates.

The field "key" shows the date and time represented as timestamps.

By default, the date_histogram aggregation sorts buckets based on the "key"
values in ascending order.

To reverse this order, you can add an order parameter to the aggregations as shown below. Then, specify that you want to sort buckets based on the "_key" values in descending(desc) order.

Example:



GET ecommerce_data/_search
{
  "size": 0,
  "aggs": {
    "transactions_by_month": {
      "date_histogram": {
        "field": "InvoiceDate",
        "calendar_interval": "1M",
        "order": {
          "_key": "desc"
        }
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

Expected response from Elasticsearch:
You will see that buckets are now sorted to return the most recent interval first.

image

Histogram Aggregation

With the date_histogram aggregation, we were able to create buckets based on time intervals.

With the histogram aggregation, we can create buckets based on any numerical interval.

For example, let's say we wanted to create buckets based on price interval that increases in increments of 10.

Syntax:



GET ecommerce_data/_search
{
  "size": 0,
  "aggs": {
    "Name your aggregations here": {
      "histogram": {
        "field":"Name the field you want to aggregate on here",
        "interval": Specify the interval here
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

The following example is similar to the last request.

Example:



GET ecommerce_data/_search
{
  "size": 0,
  "aggs": {
    "transactions_per_price_interval": {
      "histogram": {
        "field": "UnitPrice",
        "interval": 10
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

The differences are that we are naming the aggregations "transactions_per_price_interval".

We instruct Elasticsearch to run a histogram aggregation on the field "UnitPrice" and configure the price interval to increase in increments of 10.

Copy and paste the example into the Kibana console and send the request!

Expected response from Elasticsearch:
Elasticsearch returns an array of buckets where each bucket represents a price interval("key").

Each interval increases in increments of 10 in unit price. It also includes the number of documents placed in each bucket("doc_count").

image

In the first price interval, there are more than 400,000 transactions for items priced within this interval.

In the next price interval, there are over 20,000 transactions.

It seems like the higher we go up in price interval, the number of transactions decreases. Could this be a pattern?!

This might be something worth exploring if we are looking to improve our sales strategy.

Bucket sorting for histogram aggregation
Similar to the date_histogram aggregation, the histogram aggregation sorts the buckets based on the "key" values as well.

By default, the histogram aggregation sorts buckets based on the "key" values in ascending order.

But what if we wanted to sort by descending order?

Example:



GET ecommerce_data/_search
{
  "size": 0,
  "aggs": {
    "transactions_per_price_interval": {
      "histogram": {
        "field": "UnitPrice",
        "interval": 10,
        "order": {
          "_key": "desc"
        }
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

All you have to do is to add the order parameter as shown above. Then, specify that you want to sort by "_key" values in descending("desc") order.

This way, the highest price interval is listed first!

Copy and paste the example into the console and send the request!

Expected response from Elasticsearch:
You will see that the buckets are now sorted to return the price intervals in descending order.

image

Range Aggregation

The range aggregation is similar to the histogram aggregation in that it can create buckets based on any numerical interval.

The difference is that the range aggregation allows you to define intervals of varying sizes so you can customize it to your use case.

For example, what if you wanted to know the number of transactions for items from varying price ranges(between 0 and $50, between $50-$200, and between $200 and up)?

Syntax:



GET Enter_name_of_the_index_here/_search
{
  "size": 0,
  "aggs": {
   "Name your aggregations here": {
      "range": {
        "field": "Name the field you want to aggregate on here",
        "ranges": [
          {
            "to": x
          },
          {
            "from": x,
            "to": y
          },
          {
            "from": z
          }
        ]
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

The following example is similar to the last request.
Example:



GET ecommerce_data/_search
{
  "size": 0,
  "aggs": {
    "transactions_per_custom_price_ranges": {
      "range": {
        "field": "UnitPrice",
        "ranges": [
          {
            "to": 50
          },
          {
            "from": 50,
            "to": 200
          },
          {
            "from": 200
          }
        ]
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

The differences are that we are naming the aggregations "transactions_per_custom_price_ranges".

We run the range aggregation on the field "UnitPrice". Then, we provide the ranges below(0 to 50, 50 to 200, 200 and up).

Copy and paste the example into the Kibana console and send the request!

Expected response from Elasticsearch:
Elasticsearch returns an array of buckets where each bucket represents a customized price interval("key"). It also includes the number of documents("doc_count") placed in each bucket.

image

We see that over 400,000 transactions have occurred for items priced between 0 to 50.

855 transactions have occurred for items priced between 50-200.

307 transactions have occurred for items priced from 200 and up.

At this point you might be wondering if you can sort the range aggregation.

Bucket sorting for range aggregation
The range aggregation is sorted based on the input ranges you specify and it cannot be sorted any other way!

Terms Aggregation

The terms aggregation creates a new bucket for every unique term it encounters for the specified field. It is often used to find the most frequently found terms in a specified field of a document.

Syntax:



GET Enter_name_of_the_index_here/_search
{
  "aggs": {
    "Name your aggregations here": {
      "terms": {
        "field": "Name the field you want to aggregate on here",
        "size": State how many top results you want returned here
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

For example, let's say you want to identify 5 customers with the highest number of transactions(documents).

Each document is a transaction and each transaction includes a customer ID.

Therefore, if we find the 5 most frequently occurring customer IDs, we will find our top 5 customers.

To do this, we send the following example request.

Example:



GET ecommerce_data/_search
{
  "size": 0,
  "aggs": {
    "top_5_customers": {
      "terms": {
        "field": "CustomerID",
        "size": 5
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

We name the aggregations as "top_5_customers". We specify that we want to perform terms aggregation on the field "CustomerID".

Since we only want the top 5 customers, we set the size parameter within the terms aggregation to 5.

Copy and paste the example into the Kibana console and send the request!

Expected response from Elasticsearch:
Elasticsearch will return 5 customer IDs("key") with the highest number of transactions("doc_count").

image

Bucket sorting for terms aggregation
By default, the terms aggregation sorts buckets based on the "doc_count"
values in descending order.

But what if you want to sort the results in ascending order?

You would send the following request!
Example:



GET ecommerce_data/_search
{
  "size": 0,
  "aggs": {
    "5_customers_with_lowest_number_of_transactions": {
      "terms": {
        "field": "CustomerID",
        "size": 5,
        "order": {
          "_count": "asc"
        }
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

When we ask Elasticsearch to sort the buckets in ascending order, it will display customers IDs with the lowest number of documents. In other words, customers with the lowest number of transactions.

To account for that, we name our aggregations "5_customers_with_lowest_number_of_transactions". Then, we instruct Elasticsearch to perform "terms" aggregations on the field "CustomerID" and that we want 5 buckets returned("size": 5).

Then, we add an "order" parameter to the "terms" aggregation and specify that we want to sort buckets based on the "_count" values in ascending("asc") order!

Copy and paste this example into the Kibana console and send the request!

Expected response from Elasticsearch:
You will see that the buckets are now sorted in ascending order of "doc_count", showing buckets with the lowest "doc_count" first.

image

Combined Aggregations

So far, we have ran metric aggregations or bucket aggregations to answer simple questions.

There will be times when we will ask more complex questions that require running combinations of these aggregations.

For example, let's say we wanted to know the sum of revenue per day.

To get the answer, we need to first split our data into daily buckets(date_histogram aggregation).

image

Within each bucket, we need to perform metric aggregations to calculate the daily revenue.

image

The combined aggregations request looks like the following.

Calculate the daily revenue

image

Let's break this down.

We let Elasticsearch know that we are sending an aggregations request(1).

In order to calculate daily revenue, we first need to split documents into daily buckets. So we name this aggregations request "transactions_per_day"(2).

Since we are creating buckets based on time intervals, we run a date_histogram aggregation(3) on the field "InvoiceDate"(4).

Then, we set the "calendar_interval" to a "day"(5).

Thus far, we have split the documents into daily buckets.
image
Within the "transactions_per_day" aggregations, we create a sub-aggregations(6) called "daily_revenue"(7). This will calculate the total revenue generated each day.

You will notice this aggregation looks a little different.

It uses the "script"(9) to calculate the daily revenue. What is that all about?

Let’s take a look at the fields of our transaction data again.

image

When we look at our transaction document(lines 22-31), you will see that it lists the "Quantity"(line 24) of an item sold and the "UnitPrice"(line 27) of that item.

But we do not see the total revenue from this transaction. So to get the total revenue, we need to multiply the "Quantity" of item sold by its "UnitPrice".

This is where the script(9) comes in.

image

Script is used to dynamically create something in Elasticsearch.

In our context, it is used to dynamically calculate the total revenue per transaction.

In the "source" field(10), we instruct Elasticsearch that for each document(doc) in the daily bucket, get the value of the field "UnitPrice" and multiply that by the value of the field "Quantity".

That will calculate the total revenue of each transaction in our bucket. Then, we tell Elasticsearch to "sum"(8) up the total revenue from all the transactions in our bucket to calculate the daily_revenue(7).

Copy and paste the following example into the Kibana console and send the request!

Example:



GET ecommerce_data/_search
{
  "size": 0,
  "aggs": {
    "transactions_per_day": {
      "date_histogram": {
        "field": "InvoiceDate",
        "calendar_interval": "day"
      },
      "aggs": {
        "daily_revenue": {
          "sum": {
            "script": {
              "source": "doc['UnitPrice'].value * doc['Quantity'].value"
            }
          }
        }
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

Expected Response from Elasticsearch:
Elasticsearch returns an array of daily buckets.

Within each bucket, it shows the number of documents("doc_count") within each bucket as well as the revenue generated from each day("daily_revenue").

image

Calculating multiple metrics per bucket

You can also calculate multiple metrics per bucket.

For example, let's say you wanted to calculate the daily revenue and the number of unique customers per day in one go. To do this, you can add multiple metric aggregations per bucket as shown below!

image

The request above is almost identical to the last request except that we added an additional metric aggregations called "number_of_unique_customers_per_day"(2) to the sub-aggregations(1).

To calculate this value, we instruct Elasticsearch to perform cardinality aggregations(3) on the field "CustomerID"(4).

Copy and paste the following example into the Kibana console and send the request!

Example:



GET ecommerce_data/_search
{
  "size": 0,
  "aggs": {
    "transactions_per_day": {
      "date_histogram": {
        "field": "InvoiceDate",
        "calendar_interval": "day"
      },
      "aggs": {
        "daily_revenue": {
          "sum": {
            "script": {
              "source": "doc['UnitPrice'].value * doc['Quantity'].value"
            }
          }
        },
        "number_of_unique_customers_per_day": {
          "cardinality": {
            "field": "CustomerID"
          }
        }
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

Expected Response from Elasticsearch:
Elasticsearch returns an array of daily buckets.

Within each bucket, you will see that the "number_of_unique_customers_per_day" and the "daily_revenue" have been calculated for each day!

image

Example:



GET ecommerce_data/_search
{
  "size": 0,
  "aggs": {
    "transactions_per_day": {
      "date_histogram": {
        "field": "InvoiceDate",
        "calendar_interval": "day"
      },
      "aggs": {
        "daily_revenue": {
          "sum": {
            "script": {
              "source": "doc['UnitPrice'].value * doc['Quantity'].value"
            }
          }
        },
        "number_of_unique_customers_per_day": {
          "cardinality": {
            "field": "CustomerID"
          }
        }
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

Sorting by metric value of a sub-aggregation
You do not always need to sort by time interval, numerical interval, or by doc_count! You can also sort by metric value of sub-aggregations.

Let's take a look at the request below. Within the sub-aggregation, metric values "daily_revenue"(3) and "number_of_unique_customers_per_day"(4) are calculated.

image

Let's say you wanted to find which day had the highest daily revenue to date!

All you need to do is to add the "order" parameter(1) and sort buckets based on the metric value of "daily_revenue" in descending("desc") order(2)!

Copy and paste the following example into the console and send the request!

Example:



GET ecommerce_data/_search
{
  "size": 0,
  "aggs": {
    "transactions_per_day": {
      "date_histogram": {
        "field": "InvoiceDate",
        "calendar_interval": "day",
        "order": {
          "daily_revenue": "desc"
        }
      },
      "aggs": {
        "daily_revenue": {
          "sum": {
            "script": {
              "source": "doc['UnitPrice'].value * doc['Quantity'].value"
            }
          }
        },
        "number_of_unique_customers_per_day": {
          "cardinality": {
            "field": "CustomerID"
          }
        }
      }
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

Expected response from Elasticsearch:
You will see that the response is no longer sorted by date. The buckets are now sorted to return the highest daily revenue first!

image

Great job! You have mastered the basics of running aggregations requests with Elasticsearch and Kibana.

image

Try to run various metric aggregations, bucket aggregations, and combined aggregations on your own and see what type of insights you can find!

đź’– đź’Ş đź™… đźš©
lisahjung
Lisa Jung

Posted on August 5, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related