Implementing keyword search with field-level boosting in Sitecore
tackme
Posted on November 15, 2019
I had the opportunity to implement keyword search with field-level boosting at work. It was my first experience creating such functionality, so I had a hard time doing it. If you make similar functionality, this post may help you.
NOTE:
Sitecore has a feature for field-level boosting, but this is not supported until Sitecore 9.4. So boosting is implemented manually (coded) in this post.
UPDATE (2020/2/5):
I made a library for generating an efficient query of keyword search that supports field-level boosting. If you are interested in, see the next link.
Problem
My first code is like this:
public SearchResults<SearchResultItem> Search(string[] keywords)
{
using (var context = index.CreateSearchContext())
{
// the "title" field contains all keywords. (boost: 10)
var titlePred = PredicateBuilder.True<SearchResultItem>();
foreach (var keyword in keywords) {
titlePred = titlePred.And(item => item["title"].Contains(keyword).Boost(10));
}
// OR the "body" field contains all keywords. (boost: 5)
var bodyPred = PredicateBuilder.True<SearchResultItem>();
foreach (var keyword in keywords) {
bodyPred = bodyPred.And(item => item["body"].Contains(keyword).Boost(5));
}
var keywordSearchPred = PredicateBuilder
.False<SearchResultItem>()
.Or(titlePred)
.Or(bodyPred);
return context.GetQueryable<SearchResultItem>().Where(keywordSearchPred).GetResult();
}
}
This worked well at first, but I noticed this doesn't work when the keywords are contained across some fields.
Here is an example of an invalid case:
- Keywords:
Sitecore
,Experience
,Platform
Field | Value |
---|---|
title | What means Sitecore "XP"? |
body | XP stands for eXperience Platform. |
As a simple solution, enumerate all permutation with repetition of fields and keywords, and determine if they match for each one. The following code would be generated by this solution:
(item["title"].Contains("Sitecore").Boost(10) && item["title"].Contains("Experience").Boost(10) && item["title"].Contains("Platform").Boost(10))
|| (item["title"].Contains("Sitecore").Boost(10) && item["title"].Contains("Experience").Boost(10) && item["body"].Contains("Platform").Boost(5))
|| (item["title"].Contains("Sitecore").Boost(10) && item["body"].Contains("Experience").Boost(5) && item["title"].Contains("Platform").Boost(10))
|| (item["title"].Contains("Sitecore").Boost(10) && item["body"].Contains("Experience").Boost(5) && item["body"].Contains("Platform").Boost(5))
|| (item["body"].Contains("Sitecore").Boost(5) && item["title"].Contains("Experience").Boost(10) && item["title"].Contains("Platform").Boost(10))
|| (item["body"].Contains("Sitecore").Boost(5) && item["title"].Contains("Experience").Boost(10) && item["body"].Contains("Platform").Boost(5))
|| (item["body"].Contains("Sitecore").Boost(5) && item["body"].Contains("Experience").Boost(5) && item["title"].Contains("Platform").Boost(10))
|| (item["body"].Contains("Sitecore").Boost(5) && item["body"].Contains("Experience").Boost(5) && item["body"].Contains("Platform").Boost(5))
Too long! The number of Contains
condition is calculated with the following formula.
If you have 5 target fields and 3 keywords input, 375 conditions will be generated. So in many cases, the query ends up exceeding the request size limit.
Solution
Now, to solve the problem, divide the query into ① "checking whether keywords are contained in" part and ② "applying boost value to results" part.
For making ① part, create a "contents" field that has concatenated value of all the target fields. Using this field, the query can be written as follows:
item["contents"].Contains("Sitecore") && item["contents"].Contains("Experience") && item["contents"].Contains("Platform")
It's very simple.
Then, the ② part is composed of all combinations of fields and keywords. Boost each field when a keyword is contained, and combine all the boosting query with OR condition.
item["title"].Contains("Sitecore").Boost(10)
|| item["title"].Contains("Experience").Boost(10)
|| item["title"].Contains("Platform").Boost(10)
|| item["body"].Contains("Sitecore").Boost(5)
|| item["body"].Contains("Experience").Boost(5)
|| item["body"].Contains("Platform").Boost(5)
Finally, we can get the whole query by combining ① and ② with AND condition. This query has fewer conditions compare with the previous one.
This query actually works well. When ① part is evaluated as true
, it means "all keywords are in some fields at least". So ② part becomes true
, and the whole query returns true
. When ① is false
, the whole query is naturally false
.
Implementation
First, we need to create the "contents" field used in ① part. This field can be created with the Computed Field in Sitecore.
Here is a sample code:
public class ContentsField : IComputedIndexField
{
public string FieldName { get; set; }
public string ReturnType { get; set; }
public object ComputeFieldValue(IIndexable indexable)
{
if (!(indexable is SitecoreIndexableItem item))
{
return null;
}
// The fields for keyword search
var targetFields = new[] { "Title", "Body", "Summary", "Category", "Author" };
// Concatenate all value of the target fields
return string.Join(" ", targetFields.Select(keyword => item.Item[keyword]));
}
}
This class has to be registered in the configuration. Here is a patch file to register:
<?xml version="1.0" encoding="utf-8"?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:role="http://www.sitecore.net/xmlconfig/role/" xmlns:search="http://www.sitecore.net/xmlconfig/search/">
<sitecore role:require="Standalone or ContentManagement or ContentDelivery" search:require="solr">
<contentSearch>
<indexConfigurations>
<defaultSolrIndexConfiguration type="Sitecore.ContentSearch.SolrProvider.SolrIndexConfiguration, Sitecore.ContentSearch.SolrProvider">
<documentOptions type="Sitecore.ContentSearch.SolrProvider.SolrDocumentBuilderOptions, Sitecore.ContentSearch.SolrProvider">
<fields hint="raw:AddComputedIndexField">
<!-- Add contents field -->
<field fieldName="contents" returnType="string" type="NamespaceTo.ContentsField, Assembly"/>
</fields>
</documentOptions>
</defaultSolrIndexConfiguration>
</indexConfigurations>
</contentSearch>
</sitecore>
</configuration>
Then, execute "Populate solr managed schema" and "Rebuild index" in your Sitecore. The "contents" field will be generated in sitecore_master_index
(and web, core).
The main program of the keyword search can be written as follows:
public class KeywordSearchApi
{
// The target fields and its boosting value for keyword searching (You'd better load this from item or configuration)
protected static IReadOnlyDictionary<string, int> TargetFields = new Dictionary<string, int>()
{
["title"] = 10,
["body"] = 8,
["summary"] = 6,
["category"] = 2,
["author"] = 1
};
public static SearchResults<SearchResultItem> Search(ICollection<string> keywords)
{
var index = ContentSearchManager.GetIndex("sitecore_master_index");
using (var context = index.CreateSearchContext())
{
// The predicate for ①
var matchPred = keywords
.Aggregate(
PredicateBuilder.True<SearchResultItem>(),
(acc, keyword) => acc.And(item => item["contents"].Contains(keyword))); // without boosting
// The predicate for ②
var boostPred = TargetFields.Keys
// Make all pairs of field/keyword with boosting value
.SelectMany(_ => keywords, (field, keyword) => (field, keyword, boost: TargetFields[field]))
.Aggregate(
PredicateBuilder.Create<T>(item => item.Name.MatchWildcard("*").Boost(0)), // always true
(acc, pair) => acc.Or(item => item[pair.field].Contains(pair.keyword).Boost(pair.boost))); // with boosting
return context.GetQueryable<SearchResultItem>()
.Filter(matchPred)
.Where(boostPred) // Use 'Where' instead because 'Filter' ignores the boosting values.
.OrderByDescending(item => item["score"])
.GetResults();
}
}
}
If you use Sitecore PowerShell Extensions, you can easily check this method by the next script.
$keywords = "Sitecore","Experience","Platform"
[NamespaceTo.KeywordSearchApi]::Search($keywords)
Conclusion
This solution is only one of many ideas. If you have more smart ideas, let me know in the comment or your post.
Happy searching!
Posted on November 15, 2019
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.