What and how do I perform a search operation in Typesense?
Pramit Marattha
Posted on May 4, 2022
Almost every app or website has the ability to allow users to search for what they want. Search-based apps have this search feature and use it as the entry point for users to query their products or obtain information. Understanding how to build well-functioning search-based apps will yield a huge return on your time investment. It's extremely difficult to create search engines that are typo-tolerant, effective, and efficient. A typographical error could cause the search to fail even if the requested item is in the database. By removing the need to build a search engine from the ground up, Typesense could save a lot of time and effort. Users will also be able to use the app's search tool successfully, resulting in a positive user experience. Typesense is a free, open-source typo-tolerant search engine for programmers that aims to reduce the amount of time needed to conduct effective and efficient searches. To learn more about typesense, visit here =>. What exactly is Typesense
Search Operations in Typesense
A search in Typesense consists of a query against one or more text fields, as well as a set of filters against numerical or facet fields. You can sort and facet your results as well.
Using JAVASCRIPT
let searchParameters = {
'q' : 'godfather',
'query_by' : 'movie_name',
'filter_by' : 'average_rating:>100',
'sort_by' : 'average_rating:desc'
}
client.collections('movies').documents().search(searchParameters)
Using PHP
$searchParameters = [
'q' => 'godfather',
'query_by' => 'movie_name',
'filter_by' => 'average_rating:>100',
'sort_by' => 'average_rating:desc'
];
$client->collections['movies']->documents->search($searchParameters);
Using PYTHON
search_parameters = {
'q' : 'godfather',
'query_by' : 'movie_name',
'filter_by' : 'average_rating:>100',
'sort_by' : 'average_rating:desc'
}
client.collections['movies'].documents.search(search_parameters)
Using RUBY
search_parameters = {
'q' => 'godfather',
'query_by' => 'movie_name',
'filter_by' => 'average_rating:>100',
'sort_by' => 'average_rating:desc'
}
client.collections['movies'].documents.search(search_parameters)
Using SHELL
curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
"http://localhost:8108/collections/movies/documents/search\
?q=godfather&query_by=movie_name&filter_by=average_rating:>100\
&sort_by=average_rating:desc"
The following is the final sample response
you will receive.
{
"facet_counts": [],
"found": 1,
"out_of": 1,
"page": 1,
"request_params": { "q" : "" },
"search_time_ms": 1,
"hits": [
{
"highlights": [
{
"field": "movie_name",
"snippet": "<mark>Godfather</mark>",
"matched_tokens": ["Godfather"]
}
],
"document": {
"id": "124",
"movie_name": "Godfather",
"average_rating": 9,
}
}
]
}
Group By
By specifying one or more group_by fields, you can aggregate search results into groups. This method of grouping hits is only beneficial in the following situations:
Deduplication: You can combine items and remove duplicates in the search results by using one or more group_by fields. If there are multiple movies of the same rating, for example, doing a group_by=rating&group_limit=1 ensures that only one movie of each individual rating appears in the search results.
Correcting skew: If a particular type of document dominates your results, you can use group_by and group_limit to correct the skew. If your search results for a query contain a lot of documents from the same company, for example, you can use group_by=company&group limit=3 to ensure that only the top three results from each company are returned in the search results.
Unlike the plain JSON response format we saw earlier, grouping returns the hits in a nested structure. Let's run the same query with a group_by parameter as we did before:
Using JAVASCRIPT
let searchParameters = {
'q' : 'godfather',
'query_by' : 'movie_name',
'filter_by' : 'average_rating:>100',
'sort_by' : 'average_rating:desc',
'group_by' : 'genre',
'group_limit' : '1'
}
client.collections('movies').documents().search(searchParameters)
Using PHP
$searchParameters = [
'q' => 'godfather',
'query_by' => 'movie_name',
'filter_by' => 'average_rating:>100',
'sort_by' => 'average_rating:desc',
'group_by' => 'genre',
'group_limit' => '1'
];
$client->collections['movies']->documents->search($searchParameters);
Using PYTHON
search_parameters = {
'q' : 'godfather',
'query_by' : 'movie_name',
'filter_by' : 'average_rating:>100',
'sort_by' : 'average_rating:desc',
'group_by' : 'genre',
'group_limit' : '1'
}
client.collections['movies'].documents.search(search_parameters)
Using RUBY
search_parameters = {
'q' => 'godfather',
'query_by' => 'movie_name',
'filter_by' => 'average_rating:>100',
'sort_by' => 'average_rating:desc',
'group_by' => 'genre',
'group_limit' => '1'
}
client.collections['movies'].documents.search(search_parameters)
Using SHELL
curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
"http://localhost:8108/collections/movies/documents/search\
?q=godfather&query_by=movie_name&filter_by=average_rating:>100\
&sort_by=average_rating:desc&group_by=genre&group_limit=1"
The following is the final sample response
you will receive.
{
"facet_counts": [],
"found": 1,
"out_of": 1,
"page": 1,
"request_params": { "q" : "" },
"search_time_ms": 1,
"grouped_hits": [
{
"group_key": ["DRAMA"],
"hits": [
{
"highlights": [
{
"field": "movie_name",
"snippet": "<mark>Godfather</mark>"
}
],
"document": {
"id": "124",
"movie_name": "Godfather",
"average_rating": 9,
"genre": "DRAMA"
}
}
]
}
]
}
Search Arguments
q: It is simply used to search terms to use in the collection. To get all documents, use the search string . This is most useful when combined with filter_by.For example, q=&filter_by=average_rating:10 will return all documents that match a filter.
query_by: There should be one or more string / string[] fields to be queried against. Use a comma to separate multiple fields for example, movie_name, genre. A record that matches on a field earlier in the list is considered more relevant than a record that matches on a field later in the list. As a result, documents that match the movie_name field are ranked higher than documents that match on the genre field.
query_by_weights: When ranking results, the relative weight to give each "query by" field. When looking for matches, this can be used to boost fields in priority. In the same order as the "query by" fields, separate each weight with a comma. For example, using query by weights: 1,1,2 with query by: field a,field b,field c will give "field a" and "field b" equal weight and field c twice the weight.
sort_by: A list of numerical fields and sort orders that will be used to order your results. Use a comma to separate multiple fields. You can specify up to three sort fields. average_rating:desc,release_date:asc
prefix: The prefix argument is a Boolean field that indicates whether the query's last word should be treated as a prefix rather than a whole word. This is required for the development of autocomplete and instant search interfaces.
facet_by: facet_by will help you facet your results. Don't forget to use a comma to separate multiple fields.
filter_by: The "filter by" argument is used to refine your search results by filtering conditions. One or more values can be matched against a field.
genre: CRIME genre: [ROMANTIC, COMEDY]. You must mark a field as a facet and use the := operator to match a string field exactly. For example, genre:=ROMANTIC will return documents from the genre ROMANTIC rather than a genre like ROMANTIC COMEDY. You can also use multiple values to filter: genre:= [Romantic, Comedy]. Using the range operator [min..max], you can get numeric values between a minimum and maximum value. Also, not only that you can separate multiple conditions with the && operator. For example, average_rating:>9 && genre: [ROMANTIC,DRAMA]
max_facet_values: This argument simply returns the maximum number of facet values.
num_typos: This argument will help to tolerate either 1 or 2 typological errors.
facet_query: This parameter can be used to filter the facets that are returned.
page: This argument helps to fetch the results from the specific page.
per_page: This argument helps to fetch the results per each individual page.
group_by: By specifying one or more group_by fields, this argument aids in the grouping of search results into groups or buckets.
group_limit: This argument aids in limiting the total number of hits returned for each group. If the group_limit is set to A, the response will only include the top A hits in each group.
include_fields: This argument is used to list all of the document's Comma-separated fields that will be included in the search result.
exclude_fields: This argument is used to list all of the document's Comma-separated fields that will be excluded in the search result.
highlight_full_fields: This argument is used to list all of the document's Comma-separated fields that will be highlighted in the search result fully without snippeting.
highlight_affix_num_tokens: This argument determines how many tokens should be used to surround the highlighted text on each side.
highlight_start_tag: This argument is used as the start tag for the highlighted snippets.()
highlight_end_tag: This argument is used as the end tag for the highlighted snippets. ()
snippet_threshold: Field values under this length will be fully highlighted, instead of showing a snippet of the relevant portion.
drop_tokens_threshold: If the number of results returned for a query is less than the number specified in this argument, Typesense will attempt to drop tokens from the query until enough results are returned. Individually hit tokens are dropped first. To disable dropping of tokens, set the drop_tokens_threshold to 0.
typo_tokens_threshold: If the number of results returned for a query is less than the number specified in this argument, Typesense will attempt to look for tokens with more typos until enough results are found. (Default: 100)
pinned_hits: This argument helps to display a list of records that can be included in the search results at specific positions. A perfect example would be to feature or promotion of certain items on the top of search results.
hidden_hits: This argument helps to display a list of records that can be hidden in the search results at specific positions.
limit_hits: This argument helps to fetch the maximum number of hits from the collection
Federated or Multi-Search operations.
Using this Federated Multi-Search feature, you can send multiple search requests in a single HTTP request. This is particularly useful for avoiding the round-trip network latencies that would otherwise be incurred if each of these requests were sent as separate HTTP requests.
This feature can also be used to perform a federated search across multiple collections in just one HTTP request.
Multi search using Javascript
let searchRequests = {
'searches': [
{
'collection': 'movies',
'q': 'fiction',
'filter_by': 'average_rating:=[8..9]'
},
{
'collection': 'director',
'q': 'quentin tarantino'
}
]
}
let commonSearchParams = {
'query_by': 'name',
}
client.multiSearch.perform(searchRequests, commonSearchParams)
Multi search using PHP
$searchRequests = [
'searches' => [
[
'collection' => 'movies',
'q' => 'fiction',
'filter_by' => 'average_rating:=[8..9]'
],
[
'collection' => 'director',
'q' => 'quentin tarantino'
]
]
];
$commonSearchParams = [
'query_by' => 'name',
];
$client->multiSearch->perform($searchRequests, $commonSearchParams);
Multi search using PYTHON
search_requests = {
'searches': [
{
'collection': 'movies',
'q': 'fiction',
'filter_by': 'average_rating:=[8..9]'
},
{
'collection': 'director',
'q': 'quentin tarantino'
}
]
}
common_search_params = {
'query_by': 'name',
}
client.multi_search.perform(search_requests, common_search_params)
Multi search using RUBY
search_requests = {
'searches': [
{
'collection': 'movies',
'q': 'fiction',
'filter_by': 'average_rating:=[8..9]'
},
{
'collection': 'director',
'q': 'quentin tarantino'
}
]
}
common_search_params = {
'query_by': 'name',
}
client.multi_search.perform(search_requests, common_search_params)
Using SHELL
curl "http://localhost:8108/multi_search?query_by=name" \
-X POST \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '{
"searches": [
{
"collection": "movies",
"q": "fiction",
"filter_by": "average_rating:=[8..9]"
},
{
"collection": "director",
"q": "quentin tarantino"
}
]
}'
The following is the final sample response
you will receive.
{
"results": [
{
"facet_counts": [],
"found": 1,
"hits": [
{
"document": {
"name": "Pulp fiction",
"director": "quentin tarantino",
"id": "126",
"average_rating": 9
},
"highlights": [
{
"field": "name",
"matched_tokens": [
"fiction"
],
"snippet": "Pulp <mark>fiction</mark>"
}
],
"text_match": 130816
}
],
"out_of": 10,
"page": 1,
"request_params": {
"per_page": 10,
"q": "fiction"
},
"search_time_ms": 1
},
{
"facet_counts": [],
"found": 1,
"hits": [
{
"document": {
"name": "Real fiction",
"director": "Kim Ki-duk",
"id": "391",
"average_rating": 8
},
"highlights": [
{
"field": "name",
"matched_tokens": [
"fiction"
],
"snippet": "<mark>Real </mark>fiction"
}
],
"text_match": 144112
}
],
"out_of": 5,
"page": 1,
"request_params": {
"per_page": 10,
"q": "fiction"
},
"search_time_ms": 1
},
]
}
Typesense was built with several distinctive features primarily aimed at making the developer’s job easier while also giving the customer as well as users the ability to provide a better search experience as possible. Join Aviyel’s community to learn more about the open source project, get tips on how to contribute, and join active dev groups.
Aviyel is a collaborative platform that assists open source project communities in monetizing and long-term sustainability. To know more visit Aviyel.com and find great blogs and events, just like this one! Sign up now for early access, and don’t forget to follow us on our socials.
Posted on May 4, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.