Elasticsearch full-text queries explained, for humans šŸ¤“

ajsharp

Alex Sharp šŸ› sharesecret.co

Posted on January 26, 2019

Elasticsearch full-text queries explained, for humans šŸ¤“

ES has different query types. A quick summary:

  • match - standard full text query
  • match_phrase - phrase matching. like when you put a term in quotes on google.
  • match_phrase_prefix - poor manā€™s autocomplete.
  • multi_match - Multi-field match.
  • common (not covered here) - Takes stopwords (e.g. ā€œtheā€) into account.
  • query_string (not covered here) - Expert mode. Can use AND|OR|NOT and multi-field search in a query string.
  • simple_query_string (not covered here) - Simpler ā€œmore robustā€ for exposing to end-users. (how is simple more robust? seems counter-intuitive. no one knows.)

match - standard full text query

By default does near-exact word equality matching.

When to use: If you're doing full-text search on a large text field, such as the body of this post.

Good default for matching a word or phrase in a block of text, e.g. searching in a large text field, description field, etc, such as searching for a phrase in someoneā€™s Twitter bio.

Not good for partial matches of a single word or phrase, such as a partial match of someoneā€™s Twitter username (see fuzziness below).

Fuzzy matching
Allows fuzzy matching (e.g. typo tolerance) by specifying a levenstein edit distance, number of one character changes needed to make one string match another string. Basically, itā€™s the number of typos you can make and still match the string.

Example: Your target match data is a field with the value ā€œajsharpā€. With the fuzziness param set to 1, you can type ā€œjjsharpā€ and it will match ā€œajsharpā€. But if you type ā€œjssharpā€ it wonā€™t match, unless you increase fuzziness to 2.

match_phrase - for matching exact phrases

When to use: If you need to be specific about whole phrase searches, or want to enable this functionality for your users.

Letā€™s say we have two twitter profiles indexed. Document A has the phrase ā€œco-founderā€ in the bio, and Document B just has the word ā€œfounderā€.

If you search ā€œfounderā€ using match_phrase, both documents will match.
If you search ā€œco-founderā€ using match_phrase, only Document A will match. However, had you done this query using match, both Document A and Document B would match the query ā€œco-founderā€

match_phrase_prefix - poor manā€™s autocomplete

When to use: Only you need to match a single field. For many use-cases multi_match feels far more useful.

From the ES docs:

Consider the query string quick brown f. This query works by creating a phrase query out of quick and brown (i.e. the term quick must exist and must be followed by the term brown). Then it looks at the sorted term dictionary to find the first 50 terms that begin with f, and adds these terms to the phrase query.

multi_match - multi-field match šŸŽ‰šŸŽ‰šŸŽ‰

Allows you to search for the same string in multiple fields.

When to use: In many cases this is probably what you need if you're doing anything auto-completey (note: there are probably many uses cases this is great for but my primary one when I wrote this post was autocomplete, so šŸ¤·šŸ»ā€ā™‚ļø).

Multi match queries can have a type:

  • best_fields. Default. Match any field, but uses the _score from the best field. Docs
  • most_fields - Matches any field and combines the score
  • cross_fields - Treats fields with same analyzer as if one big field.
  • phrase - Uses match_phrase
  • šŸŽ‰šŸŽ‰šŸŽ‰ phrase_prefix - Uses match_phrase_prefix on each field and combines the _score. Allows multi-field autocomplete šŸ’„šŸ’„šŸ’„

Field Boosting
Fields can be boosted by using the ^ followed by number. Assume twitter profiles with screen_name and name fields. If you do a query like this, the screen_name will be three times as important as the name field in the ranking:

{
  "query": {
    "multi_match" : {
      "query" : ā€œajsharpā€,
      "fields" : [ ā€œscreen_name^3", ā€œnameā€ ] 
    }
  }
}

ā˜ļø Be sure to check out Sharesecret, which makes it easy to securely share sensitive data.

šŸ’– šŸ’Ŗ šŸ™… šŸš©
ajsharp
Alex Sharp šŸ› sharesecret.co

Posted on January 26, 2019

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related