ElasticSearch: Zero to Hero in 12 Commands
Raphael Jambalos
Posted on August 27, 2022
It's relatively easy to get started with ElasticSearch. But as our use cases get more specific, we found the documentation lacking. This guided cheatsheet will execute 12 commands: from setting up your ES index to making advanced ES queries to support advanced (but common) use cases.
The 12 commands works when done sequentially. I will explain each of them, but trying them for yourself is still the best.
This post is part of a broader series on ElasticSearch that will be released in the coming weeks:
- The Guided ElasticSearch Cheatsheet you need to Get Started with ES - you are here
- Using DynamoDB + ElasticSearch for prod workloads - coming soon
- And how to create DynamoDB Streams to sync data changes from DynamoDB to ES asynchronously - coming soon
0 | Prerequisites
Install Elasticsearch with this official ES Guide. And then, turn on the ES server on localhost:9200
For easier testing, installing an API platform like Postman is a must.
A | Setup the index
In ElasticSearch, we store our data in indexes (similar to tables in your MySQL database). We populate indexes with documents (similar to rows). We will create and set up your first index in the subsequent commands.
[1] Verify the ES cluster is accessible
GET localhost:9200
First, make sure your local ES server is online, and you have your Postman open. Create a new GET request headed for localhost:9200. You should see something like this:
[2] Create an index
PUT localhost:9200/mynewindex
Now, let's create our first index. Indexes store our data. It is equivalent to creating a table in relational databases.
[3] Create the mapping for the index
The index we just created has no mapping. A mapping is similar to a schema in SQL databases. It dictates the form of the documents that our index will ingest. Once defined, the index will refuse to accept documents that cannot fit into this mapping (i.e, we defined stocks as integer below. If we try to insert a row with stocks="none", the operation will not continue).
One thing you'd notice with ES is that these mappings are permissive by default. If I add a row with a new attribute "perishable" = true, when I push a document to ES, the schema will add that attribute and infer its data type. In this case, it will add a new attribute in the mapping for "perishable" with data type "boolean".
There are options that you can add when you create your index to only allow attributes defined in mapping of your index, nothing more, nothing less.
In this command, we create the mapping for our newly created index.
PUT localhost:9200/mynewindex/_mapping
{
"properties": {
"product_id": {
"type": "keyword"
},
"price": {
"type": "float"
},
"stocks": {
"type": "integer"
},
"published": {
"type": "boolean"
},
"title": {
"type": "text"
},
"sortable_title": {
"type": "text"
},
"tags": {
"type": "text"
}
}
}
Most of the data types are straightforward, except for Text and Keyword. This article explains the difference clearly.
But TLDR, Text allows you to query words inside the field (i.e querying "Burger" will show the product "Cheese Burger with Fries"). It does this by treating each word in the text as individual tokens that could be searched: "cheese", "burger", "with", "fries".
On the other hand, Keyword treats the content of the field as one, so if you want to get the cheeseburger with fries, you'd have to query it: "Cheese Burger with Fries". Querying "burger" will return nothing.
[4] Show the mapping of the index
Let's verify if we have successfully created the mapping for the index by sending a GET request.
GET localhost:9200/mynewindex
B | Data Operations with our ES Index
With our index already set up, let's add data and chip away at the more exciting bits of ES!
[5] Create data for the index
For this section, let's send three consecutive post requests with different a request body per request. This adds 3 "rows" inside our Elasticsearch index.
POST localhost:9200/mynewindex/_doc
{
"product_id": "123",
"price": 99.75,
"stocks": 10,
"published": true,
"sortable_title": "Kenny Rogers Chicken Sauce",
"title": "Kenny Rogers Chicken Sauce",
"tags": "chicken sauce poultry cooked party"
}
POST localhost:9200/mynewindex/_doc
{
"product_id": "456",
"price": 200.75,
"stocks": 0,
"published": true,
"sortable_title": "Best Selling Beer Flavor",
"title": "Best Selling Beer Flavor",
"tags": "beer best-seller party"
}
POST localhost:9200/mynewindex/_doc
{
"product_id": "789",
"price": 350.5,
"stocks": 200,
"published": false,
"sortable_title": "Female Lotion",
"title": "Female Lotion",
"tags": "lotion female"
}
[6] Display all the data
Now, let's see if the three documents we inserted via command #5 got inside our index. This command shows all the documents inside your index:
POST localhost:9200/mynewindex/_search
{
"query": {
"match_all": {}
}
}
It does!
[7] Exact search with product id
Now, let's start with a simple search. Let's search by product id.
POST localhost:9200/mynewindex/_search
{
"query": {
"term": {
"product_id": "456"
}
}
}
In the command above, we are using a "term query" because we are looking for a product with a "product_id" that exactly matches the string "456". The term query works because the data type of "product_id" is "keyword".
[8] Fuzzy search with titles
Now, onto the more exciting bits.
ES is known for its comprehensive search capability. Let's sample that by creating our first Fuzzy search. Fuzzy searches allow us to search for products by typing just a few words instead of the whole text of the field. Instead of typing the full name of the product name (i.e Incredible Tuna Mayo Jumbo 250), the customer just instead has to search for the part he recalls of the product (i.e Tuna Mayo).
POST localhost:9200/mynewindex/_search
{
"query": {
"match": {
"title": "Beer Flavor"
}
}
}
In the default setting, we can get the product "Best Selling Beer Flavor" even with our incomplete query "Beer Flavor". There are other settings that allow us to tolerate misspellings or incomplete words to show results (i.e Bee Flavo)
Also, notice carefully that we now use a "match query" instead of a "term query" because we want to be able to get results even if we didn't type the full product name. The match query works because the title field is of type "text".
[9] Sorted by prices
Another thing we usually have to do with an e-commerce website is to sort products by specific categories like price or rating:
POST localhost:9200/mynewindex/_search
{
"query": {
"match_all": {}
},
"sort": [
{"price": "desc"},
"_score"
]
}
With our query above, we return all the products sorted by most expensive to the cheapest. Notice that the sort parameter is a list, which allows us to add multiple criteria for sorting. We also added "_score", which is an elasticsearch keyword for search relevance. We will explore this concept deeper on later examples.
[10] Search for all "beer" products that are PUBLISHED, and in stock. Sorted by cheapest to most expensive
To make things more interesting, let's add several more beer products. We do this by sending a POST request thrice, with a different request body each time.
POST localhost:9200/mynewindex/_doc
{
"product_id": "111",
"price": 350.55,
"stocks": 10,
"published": true,
"sortable_title": "Tudor Beer Lights",
"title": "Tudor Beer Lights",
"tags": "beer tudor party"
}
POST localhost:9200/mynewindex/_doc
{
"product_id": "222",
"price": 700.50,
"stocks": 500,
"published": false,
"sortable_title": "Stella Beer 6pack",
"title": "Stella Beer 6pack",
"tags": "beer stella party"
}
POST localhost:9200/mynewindex/_doc
{
"product_id": "333",
"price": 340,
"stocks": 500,
"published": true,
"sortable_title": "Kampai Beer 6pack",
"title": "Kampai Beer 6pack",
"tags": "beer kampai party"
}
With more documents in our index, we can now do the query. This is a complex query that has three conditions that must be fulfilled. We analyze the query below.
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "Beer"
}
},
{
"term": {
"published": true
}
},
{
"range": {
"stocks": {
"gt": 0
}
}
}
]
}
},
"sort": [
{"price": "asc"},
"_score"
]
}
With our recent additions, there are four products with the word beer:
- 456: Best Selling Beer Flavor
- 111: Tudor Beer Lights
- 222: Stella Beer 6pack
- 333: Kampai Beer 6pack
Since we filter out items whose inventory is zero (or below), we remove product 456 from the list. Another filter is that the product must be published (published = true). With this filter, product 222 is removed. We are left with the 2 products below. They must be sorted by cheapest to most expensive, as is shown below:
- 333: Kampai Beer 6pack (price = 340)
- 111: Tudor Beer Lights (price = 350.55)
In this example, the key "must" was used, with a list as its value. The list contains conditions that must be met together for the query requirements to be met. In this example, its "title must have the word 'beer'" AND "published attribute is equal to true" AND "stocks is greater than zero".
[11] Search for all products that have at least 1 of the following tags ['poultry, 'kampai', 'best-seller'], that are PUBLISHED, and in stock. Sorted by cheapest to most expensive
Our previous query just involved three conditions that must be ALL TRUE to hold. That's equivalent to "A and B and C".
In this query, we still have three conditions that have to be all true, but the 1st condition is marked as true if it has either "poultry", "kampai", or "best-seller".In this example, we introduce the syntax for "OR":
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"match": {
"tags": "poultry"
}
},
{
"match": {
"tags": "kampai"
}
},
{
"match": {
"tags": "best-seller"
}
}
],
"minimum_should_match": 1
}
},
{
"term": {
"published": true
}
},
{
"range": {
"stocks": {
"gt": 0
}
}
}
]
}
},
"sort": [
{
"price": "asc"
},
"_score"
]
}
In this query, we still have a "must" keyword, but its first contains a "should" keyword. The whole query is equivalent to: (A or B or C) AND D AND E. The "should" implies that as long as one condition is met, the (A or B or C) statement returns true.
A tweak we can do is adjust the "minimum_should_match" (msm) parameter, so we can require that two or three or N conditions be met for the statement to be true. In our example, if msm=2, it means a product has to have two matching tags to be considered true (i.e a product has to be both poultry and kampai).
We analyze the query below:
- The product should have at least 1 of these tags: poultry, kampai, best-seller
- This matches 3 products: poultry (pid: 123), kampai (pid: 333) and best-seller (pid: 456)
- That is published
- All 3 PIDs from the previous step are already published. So no changes.
- Should have stocks
- Since pid 456 does not have stocks, we are left with pid 123 and pid 333
- Sorted by price
- pid 333 is 340pesos
- pid 123 is 99.75pesos
- therefore, the order should be pid 123 => pid 323
[12] Search for all products that have at least 1 of the following tags ['poultry, 'kampai', 'best-seller'], and in stock. The price should be between 0 to 300 only. Sorted by cheapest to most expensive
This query is similar to #11 but we added another criteria that the price of the products returned should only be between 0 and 300.
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"match": {
"tags": "poultry"
}
},
{
"match": {
"tags": "kampai"
}
},
{
"match": {
"tags": "best-seller"
}
}
],
"minimum_should_match": 1
}
},
{
"term": {
"published": true
}
},
{
"range": {
"stocks": {
"gt": 0
}
}
},
{
"range": {
"price": {
"gt": 0,
"lt": 300
}
}
}
]
}
},
"sort": [
{
"price": "asc"
},
"_score"
]
}
This query introduces the "range" keyword, which allows us to filter for items if they match a specific range of values. For the price, we set a condition for the price to be between 0 and 300. For the stock, we only set the price to be greater than zero.
Let's analyze the query:
- From the results in #11, we have pid 333 (340pesos) and pid 123 (99.75pesos)
- With the 0-300 price filter, our only result will be pid 123 (99.75 pesos)
Conclusion
Getting started with ElasticSearch is easy! But your searching needs can become more complex as your business needs grow. This cheatsheet helps you navigate that complexity.
An alternative to learning ES syntax at this level is to use a DSL library for Elasticsearch that "abstracts" the long-form syntax of Elasticsearch. It is a powerful tool for general-purpose usage of ES. However, as your query needs grow, learning the syntax under that DSL will keep you informed on the options you can add to make your searching richer.
How about you? Are there other ElasticSearch syntax you want to learn?
Maybe I can help! Type it in the comments, and I'll try to add it to the article.
Posted on August 27, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.