Understanding Search Scores in MongoDB Hybrid Search

shannonlal

Shannon Lal

Posted on November 19, 2024

Understanding Search Scores in MongoDB Hybrid Search

Over the past few weeks, I've been diving deep into MongoDB's hybrid search capabilities, specifically focusing on understanding how to improve search result relevancy. I discovered that understanding and optimizing search scores was crucial for delivering better results to our users. This led me to explore how MongoDB handles scoring in both traditional text search and vector search, and how these scores can be effectively combined.

If you're working with hybrid search in MongoDB, you might be interested in my previous posts about implementing semantic search (https://dev.to/shannonlal/implementing-complex-semantic-search-with-mongodb-51ib) and optimizing search with boost and bury (https://dev.to/shannonlal/understanding-mongodb-atlas-search-scoring-for-better-search-results-1in4). Today, I'll share insights about accessing and interpreting search scores in MongoDB's hybrid search implementation.

A Simple Hybrid Search Implementation

Here's a simplified MongoDB aggregation pipeline that demonstrates how to capture both vector and text search scores:


[
    {
      $vectorSearch: {
        index: 'ai_image_description_vector_index',
        path: 'descriptionValues',
        queryVector: embedding,
        numCandidates: limit,
        limit: limit,
        filter: {
          userId: userId,
          deleted: false
        }
      }
    },
    {
      $project: {
        description: 1,
        name: 1,
        searchType: 'vector',
        vectorScore: { $meta: 'vectorSearchScore' }
      }
    },
    {
      $unionWith: {
        coll: 'ai_generated_image',
        pipeline: [
          {
            $search: {
              index: 'ai_image_description',
              compound: {
                must: [
                  {
                    autocomplete: {
                      query: query,
                      path: 'description'
                    }
                  }
                ],
                filter: [
                  {
                    equals: {
                      path: 'deleted',
                      value: false
                    }
                  },
                  {
                    text: {
                      path: 'userId',
                      query: userId
                    }
                  }
                ]
              },
              scoreDetails: true
            }
          },
          {
            $addFields: {
              searchType: 'text',
              textScore: { $meta: 'searchScore' },
              textScoreDetails: { $meta: 'searchScoreDetails' }
            }
          }
        ]
      }
    },
    {
      $group: {
        _id: null,
        docs: { $push: '$$ROOT' }
      }
    },
    {
      $unwind: {
        path: '$docs',
        includeArrayIndex: 'rank'
      }
    },
    {
      $group: {
        _id: '$docs._id',
        description: { $first: '$docs.description' },
        name: { $first: '$docs.name' },
        vector_score: { $max: '$docs.vectorScore' },
        text_score: { $max: '$docs.textScore' },
        text_score_details: { $max: '$docs.textScoreDetails' },
        searchType: { $first: '$docs.searchType' }
      }
    },
    {
      $skip: cursor ? parseInt(cursor) : 0
    },
    {
      $limit: limit
    }
]
Enter fullscreen mode Exit fullscreen mode

Understanding $unionWith in Hybrid Search

The $unionWith operation plays a crucial role in implementing hybrid search by executing two completely independent searches and combining their results into a single output. During my testing, I observed an interesting pattern: the initial vector search returned 8 documents, and when combined with the text search results through $unionWith, the total grew to 12 documents. This increase occurred because some documents matched both search criteria and appeared twice in the combined results. However, the subsequent grouping stages efficiently handled these duplicates by merging documents with the same ID while preserving both their vector and text search scores. This approach provides a clean way to leverage both search methods' strengths while ensuring users receive a deduplicated, comprehensive result set.

Accessing Search Scores

Vector Search Scores
To capture vector similarity scores, add a field using the vectorSearchScore metadata:

vectorScore: { $meta: 'vectorSearchScore' }
Enter fullscreen mode Exit fullscreen mode

This score represents the similarity between your query vector and the document vectors (using cosine similarity or dot product).

Text Search Scores

Accessing text search scores in MongoDB requires a two-step approach. First, you need to enable scoreDetails in your search query, which unlocks detailed scoring information. Then, you can capture both the basic search score and the detailed scoring breakdown using MongoDB's meta operators:

          {
            $addFields: {
              searchType: 'text',
              textScore: { $meta: 'searchScore' },
              textScoreDetails: { $meta: 'searchScoreDetails' }
            }
          }
Enter fullscreen mode Exit fullscreen mode

The basic score provides a quick way to understand document relevance, while the scoreDetails offer deep insights into how that score was calculated. These details include factors like term frequency (how often the search term appears), field weights (the importance of different fields), and any applied boost factors.

Working with search scores in MongoDB presents some interesting challenges, particularly when dealing with different score ranges between vector and text searches. However, MongoDB's detailed scoring information, combined with the $unionWith operation, provides powerful tools for implementing sophisticated ranking strategies. By understanding both the final score and its components, you can make more informed decisions about balancing search results in your hybrid implementation.

Later this week, I'll be sharing a detailed look at implementing Reciprocal Rank Fusion with MongoDB hybrid search, which offers an elegant solution for combining and ranking results from different search methods. If you're working with MongoDB search and have questions about search scores or hybrid search implementation, feel free to reach out in the comments or connect with me directly.

Stay tuned for more insights into optimizing MongoDB search functionality!

💖 💪 🙅 🚩
shannonlal
Shannon Lal

Posted on November 19, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related