Sorting Multilanguage Text Properly on OpenSearch
Furkan Kalkan
Posted on November 15, 2024
If you have multi-language or non-English content and use OpenSearch, default sort method will not sort the content alphabetically. Default sort method use Unicode values of characters in order to sort texts alphabetically, it works in English but fails in most non-English languages. OpenSearch documentation did not address this problem. In Elasticsearch documentation there is plugin named analysis-icu
mentioned to solve this issue 1. This plugin supported by OpenSearch too 2. There is not much information about OpenSearch specific version of this plugin but usage is same as Elasticsearch one:
-
Install the plugin on each node(s):
/usr/share/opensearch/bin/opensearch-plugin install analysis-icu --batch
You can use init containers method to make this if you use Kubernetes. Don't forget the mount the plugin directory/usr/share/opensearch/plugins/
on both container. After installation of plugin, restart your nodes.
Add sort subfield to your fields 3 :
{
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"sort": {
"type": "icu_collation_keyword",
"index": false
}
}
}
}
}
}
You can add language
and country
parameters after type
if your content in single language. Also you can add numeric: true
parameter to sort numbers in text in correct order.
Since our subfield used only for sorting, use "index": false
to turn off indexing of field.
Posted on November 15, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.