Navigating Hybrid Search with MongoDB: Ugly Approach
Shannon Lal
Posted on February 19, 2024
In my last two blogs [https://dev.to/shannonlal/unlocking-the-power-of-hybrid-search-5bej, https://dev.to/shannonlal/building-blocks-for-hybrid-search-combining-keyword-and-semantic-search-236k] I focused on giving an overview of MongoDB's Vector Search with the goal of demonstrating hybrid search in Mongo. In this blog I am going present a solution on how I got hybrid search to work with Mongo; however, it took several attempts and I will talk about my different strategies.
Attempting MongoDB Aggregation for Hybrid Search
My initial strategy was to do a MongoDB aggregation to perform a dual search - one on the text and another on the vectors. The idea was to leverage the power of MongoDB's $search stage to execute a text search followed by a vector search within the same pipeline. Here is the aggregation query that I put together
[
// Stage 1: Text-based search on 'description' field
{
$search: {
index: 'text_index',
text: {
query: 'searchTerm',
path: 'description',
score: { boost: { value: 2 } }
}
}
},
// Stage 2: Incorporate the vector search based on the embedding
{
$search: {
index: 'vector_index',
compound: {
should: [
{
vector: {
path: 'embedding',
query: [/* your vector embedding here */],
score: { boost: { value: 1 } }
}
}
]
}
}
},
{
$sort: {
'score': { $meta: 'textScore' }
}
},
{
$project: {
_id: 0, // excluding the id field
name: 1,
description: 1,
textScore: { $meta: 'textScore' },
vectorScore: { $meta: 'searchScore' }
}
}
];
However, MongoDB only allows one $search stage and it must be at the beginning of the pipeline. As a result it looks like the aggregation pipeline won't work.
Crafting a Combined Search Index
The second strategy I tried involved creating a unified search index that could potentially handle both text and vector searches. Below is the index that I tried to create.
{
"mappings": {
"dynamic": false,
"fields": {
"description": {
"type": "string",
"analyzer": "lucene.standard"
},
"embedding": {
"type": "vector",
"similarity": "cosine",
"numDimensions": 512
}
}
}
}
Unfortunately, this approach hit a roadblock as MongoDB does not recognize 'vector' as a valid type within its index mappings.
Mongo Union with Aggregation
The final approach was to use a unionWith technique with Mongo Aggregation to perform the Vector Search first and then using the unionWith operator perform a Text Search.
The following code is based on my previous blog on Hybrid Search. Here is the aggregation pipeline code for hybrid search
const pipeline = [
{
$vectorSearch: {
index: 'vector_index',
path: 'embedding',
queryVector: embedding,
numCandidates: 10,
limit: 10,
},
},
{ $addFields: { vs_score: { $meta: 'vectorSearchScore' } } },
{
$project: {
vs_score: 1,
_id: 1,
description: 1,
name: 1,
},
},
{
$unionWith: {
coll: 'vector_test',
pipeline: [
{
$search: {
index: 'default',
text: { query: searchTerm, path: 'description' },
},
},
{ $limit: 10 },
{ $addFields: { fts_score: { $meta: 'searchScore' } } },
{
$project: {
fts_score: 1,
_id: 1,
description: 1,
name: 1,
},
},
],
},
},
{
$group: {
_id: '$_id',
vs_score: { $max: '$vs_score' },
fts_score: { $max: '$fts_score' },
description: { $first: '$description' },
name: { $first: '$name' },
},
},
{
$project: {
description: 1,
name: 1,
vs_score: { $ifNull: ['$vs_score', 0] },
fts_score: { $ifNull: ['$fts_score', 0] },
},
},
{
$project: {
description: 1,
name: 1,
score: { $add: ['$fts_score', '$vs_score'] },
_id: 1,
vs_score: 1,
fts_score: 1,
},
},
{ $sort: { score: -1 } },
{ $limit: 10 },
];
The aggregation is a little bit more complex than I would like it to be but it seems to do the job. I think the one thing that I would recommend is paying attention to how the combined score is determined. In this approach we are just adding the two scores (vs_score and fts_score) together; however, this may not be the best solution for your use case. I have included the score results based on my test search that I did below
Query Results:
Search Term | Combined Score | Text Score | Vector Score |
---|---|---|---|
Car for hire | 1.373 | 0.653 | 0.720 |
Limo Hires | 0.775 | 0.037 | 0.737 |
Electric Scooter | 0.733 | 0.044 | 0.689 |
Bike Share | 0.731 | 0.042 | 0.689 |
Car Dealership | .651 | 0.036 | 0.615 |
The Road Ahead
Over the next couple of weeks I am going to be load testing this out to see how the query handles when search for large number of documents. I would definitely welcome any feedback or comments on how I can improve the query or better strategies to get hybrid search working.
Thanks
Posted on February 19, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.