Navigating Hybrid Search with MongoDB: Ugly Approach

shannonlal

Shannon Lal

Posted on February 19, 2024

Navigating Hybrid Search with MongoDB: Ugly Approach

In my last two blogs [https://dev.to/shannonlal/unlocking-the-power-of-hybrid-search-5bej, https://dev.to/shannonlal/building-blocks-for-hybrid-search-combining-keyword-and-semantic-search-236k] I focused on giving an overview of MongoDB's Vector Search with the goal of demonstrating hybrid search in Mongo. In this blog I am going present a solution on how I got hybrid search to work with Mongo; however, it took several attempts and I will talk about my different strategies.

Attempting MongoDB Aggregation for Hybrid Search

My initial strategy was to do a MongoDB aggregation to perform a dual search - one on the text and another on the vectors. The idea was to leverage the power of MongoDB's $search stage to execute a text search followed by a vector search within the same pipeline. Here is the aggregation query that I put together

 [
  // Stage 1: Text-based search on 'description' field
  {
    $search: {
      index: 'text_index', 
      text: {
        query: 'searchTerm',
        path: 'description',
        score: { boost: { value: 2 } } 
      }
    }
  },
  // Stage 2: Incorporate the vector search based on the embedding
  {
    $search: {
      index: 'vector_index', 
      compound: {
        should: [
          {
            vector: {
              path: 'embedding',
              query: [/* your vector embedding here */],
              score: { boost: { value: 1 } } 
            }
          }
        ]
      }
    }
  },
  {
    $sort: {
      'score': { $meta: 'textScore' } 
    }
  },
  {
    $project: {
      _id: 0, // excluding the id field
      name: 1,
      description: 1,
      textScore: { $meta: 'textScore' },
      vectorScore: { $meta: 'searchScore' }
    }
  }
];
Enter fullscreen mode Exit fullscreen mode

However, MongoDB only allows one $search stage and it must be at the beginning of the pipeline. As a result it looks like the aggregation pipeline won't work.

Crafting a Combined Search Index

The second strategy I tried involved creating a unified search index that could potentially handle both text and vector searches. Below is the index that I tried to create.

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "description": {
        "type": "string",
        "analyzer": "lucene.standard"
      },
      "embedding": {
        "type": "vector",
        "similarity": "cosine",
        "numDimensions": 512
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Unfortunately, this approach hit a roadblock as MongoDB does not recognize 'vector' as a valid type within its index mappings.

Mongo Union with Aggregation

The final approach was to use a unionWith technique with Mongo Aggregation to perform the Vector Search first and then using the unionWith operator perform a Text Search.

The following code is based on my previous blog on Hybrid Search. Here is the aggregation pipeline code for hybrid search

    const pipeline = [
      {
        $vectorSearch: {
          index: 'vector_index',
          path: 'embedding',
          queryVector: embedding, 
          numCandidates: 10,
          limit: 10,
        },
      },
      { $addFields: { vs_score: { $meta: 'vectorSearchScore' } } },
      {
        $project: {
          vs_score: 1,
          _id: 1,
          description: 1,
          name: 1,
        },
      },
      {
        $unionWith: {
          coll: 'vector_test',
          pipeline: [
            {
              $search: {
                index: 'default',
                text: { query: searchTerm, path: 'description' }, 
              },
            },
            { $limit: 10 },
            { $addFields: { fts_score: { $meta: 'searchScore' } } },
            {
              $project: {
                fts_score: 1,
                _id: 1,
                description: 1,
                name: 1,
              },
            },
          ],
        },
      },
      {
        $group: {
          _id: '$_id',
          vs_score: { $max: '$vs_score' },
          fts_score: { $max: '$fts_score' },
          description: { $first: '$description' },
          name: { $first: '$name' },
        },
      },
      {
        $project: {
          description: 1,
          name: 1,
          vs_score: { $ifNull: ['$vs_score', 0] },
          fts_score: { $ifNull: ['$fts_score', 0] },
        },
      },
      {
        $project: {
          description: 1,
          name: 1,
          score: { $add: ['$fts_score', '$vs_score'] },
          _id: 1,
          vs_score: 1,
          fts_score: 1,
        },
      },
      { $sort: { score: -1 } },
      { $limit: 10 },
    ];
Enter fullscreen mode Exit fullscreen mode

The aggregation is a little bit more complex than I would like it to be but it seems to do the job. I think the one thing that I would recommend is paying attention to how the combined score is determined. In this approach we are just adding the two scores (vs_score and fts_score) together; however, this may not be the best solution for your use case. I have included the score results based on my test search that I did below

Query Results:

Search Term Combined Score Text Score Vector Score
Car for hire 1.373 0.653 0.720
Limo Hires 0.775 0.037 0.737
Electric Scooter 0.733 0.044 0.689
Bike Share 0.731 0.042 0.689
Car Dealership .651 0.036 0.615

The Road Ahead

Over the next couple of weeks I am going to be load testing this out to see how the query handles when search for large number of documents. I would definitely welcome any feedback or comments on how I can improve the query or better strategies to get hybrid search working.

Thanks

💖 💪 🙅 🚩
shannonlal
Shannon Lal

Posted on February 19, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related