MongoDB: You're doing it wrong!

paulallies

Paul Allies

Posted on December 22, 2020

MongoDB: You're doing it wrong!

The main reason we use NoSQL, typically, MongoDB, is to store and query big data in a scalable way.

Document Reference Pattern.

When we think of modelling NoSQL in a RDBMS way, we'll need to reference documents in other collections to link or join 2 pieces of related data.

// document in organization collection
{
   _id: "google",
   name: "Google"
}

// document in user collection
{
   _id: "john",
   name: "John Smith",
   organization_id: "google"

}

{
   _id: "jeff",
   name: "Jeff Brown",
   organization_id: "google"

}

Enter fullscreen mode Exit fullscreen mode

So to find an organization and all the users in one query we need to use the aggregation framework:

db.getCollection('organization')
.aggregate([
  {
    $match: { _id: "google"}
  },
  {
    $lookup: {
        from: "user",
        localField: "_id",
        foreignField: "organization_id",
        as : "users"
    }
  }
])
Enter fullscreen mode Exit fullscreen mode

Result

{
    "_id" : "google",
    "name" : "Google",
    "users" : [ 
        {
            "user_id" : "john",
            "name" : "John Smith",
            "organization_id" : "google"
        },
        {
            "user_id" : "jeff",
            "name" : "Jeff Brown",
            "organization_id" : "google"
        }
    ]
}
Enter fullscreen mode Exit fullscreen mode

When using joins, our queries don't scale. The computation cost rises as data footprint increases.

Adjacency list pattern

Let's try the Adjacency list pattern for storing data:
Use one collection for all data. Let's call it "DATA"

//organization document in DATA collection
{
    "_id": "org#google",
    "name": "Google",
}
{
    "_id": "org#microsoft",
    "name": "Microsoft",
}
{
    "_id": "org#apple",
    "name": "Apple",
}

//user document in DATA collection
{
   _id: "org#google#user#john",
   name: "John Smith"
}
{
   _id: "org#google#user#jeff",
   name: "Jeff Brown"
}
{
   _id: "org#apple#user#tim",
   name: "Tim Cook"
}
Enter fullscreen mode Exit fullscreen mode

Let's try to find an organization and all the users in one query.

db.getCollection('DATA').find({_id: {$regex: /^org#google/}})
Enter fullscreen mode Exit fullscreen mode

The query finds all documents in the DATA collection starting where _id starts with "org#google"

Result


{
    "_id" : "org#google",
    "name" : "Google"
}

{
    "_id" : "org#google#user#jeff",
    "name" : "Jeff Brown"
}

{
    "_id" : "org#google#user#john",
    "name" : "John Smith"
}
Enter fullscreen mode Exit fullscreen mode

We can retrieve the same data without a join, without adding indexes, without using the aggregation framework

đź’– đź’Ş đź™… đźš©
paulallies
Paul Allies

Posted on December 22, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related