MongoDB Schema Design, Indexes & More Best Practices

tarikul01

Md.Tarikul

Posted on July 16, 2024

MongoDB Schema Design, Indexes & More Best Practices

Schema Design Approaches – Relational vs. MongoDB:
Relational Databases: Best for applications requiring complex queries, transactions, and data integrity. Ideal for structured data and systems where schema stability is important (e.g., financial applications, ERP systems).
MongoDB: Best for applications requiring scalability, flexibility, and rapid development. Ideal for handling large volumes of unstructured or semi-structured data, real-time analytics, and applications where data structures evolve frequently (e.g., content management systems, IoT applications).

Embedding vs. Referencing:

Embedding
Example:
Imagine a blog application where each post has comments.

{
"_id": 1,
"title": "First Post",
"content": "This is the content of the first post",
"comments": [
{
"user": "Alice",
"comment": "Great post!",
"date": "2024-07-16"
},
{
"user": "Bob",
"comment": "Thanks for sharing!",
"date": "2024-07-17"
}
]
}

Advantages:

Performance: Faster read operations since related data is stored together.
Atomicity: Updates to a document are atomic, ensuring consistency.
Simplicity: Easy to manage and query related data within a single document.
Limitations:

Document Size: Limited to 16 MB, so embedding large amounts of related data can exceed this limit.
Data Redundancy: Duplicating data in multiple places can lead to inconsistencies and increased storage usage.

Referencing
Example:
Using the same blog application, comments are stored in a separate collection.

Posts Collection:
{
"_id": 1,
"title": "First Post",
"content": "This is the content of the first post",
"comments": [
101,
102
]
}

Comments Collection:
{
"_id": 101,
"user": "Alice",
"comment": "Great post!",
"date": "2024-07-16",
"postId": 1
}
{
"_id": 102,
"user": "Bob",
"comment": "Thanks for sharing!",
"date": "2024-07-17",
"postId": 1
}

Advantages:

Flexibility: No size limit issues as comments grow.
Data Normalization: Reduces data redundancy and potential inconsistencies.
Scalability: Easier to manage large volumes of related data.
Limitations:

Complexity: More complex queries to fetch related data, requiring joins.
Performance: Slower read operations since data is spread across multiple documents.
Atomicity: Updates to related data across multiple documents are not atomic, which can lead to inconsistencies.

Recap:

  • One-to-One - Prefer key value pairs within the document
  • One-to-Few - Prefer embedding
  • One-to-Many - Prefer embedding
  • One-to-Squillions - Prefer Referencing
  • Many-to-Many - Prefer Referencing

General Rules for MongoDB Schema Design:

  • Rule 1:
    Favor embedding unless there is a compelling reason not to.

  • Rule 2:
    Needing to access an object on its own is a compelling reason not to embed it.

  • Rule 3:
    Avoid joins and lookups if possible, but don't be afraid if they can provide a better schema design.

  • Rule 4:
    Arrays should not grow without bound. If there are more than a couple of hundred documents on the many side, don't embed them; if there are more than a few thousand documents on the many side, don't use an array of ObjectID references. High-cardinality arrays are a compelling reason not to embed.

  • Rule 5:
    As always, with MongoDB, how you model your data depends entirely on your particular application's data access patterns. You want to structure your data to match the ways that your application queries and updates it.

Relationships

One-to-One:

Let's take a look at our User document. This example has some great one-to-one data in it. For example, in our system, one user can only have one name. So, this would be an example of a one-to-one relationship. We can model all one-to-one data as key-value pairs in our database.
{
"_id": "ObjectId('AAA')",
"name": "Joe Karlsson",
"company": "MongoDB",
"twitter": "@JoeKarlsson1",
"twitch": "joe_karlsson",
"tiktok": "joekarlsson",
"website": "joekarlsson.com"
}

One-to-Few:

Okay, now let's say that we are dealing a small sequence of data that's associated with our users. For example, we might need to store several addresses associated with a given user. It's unlikely that a user for our application would have more than a couple of different addresses. For relationships like this, we would define this as a one-to-few relationship.
{
"_id": "ObjectId('AAA')",
"name": "Joe Karlsson",
"company": "MongoDB",
"twitter": "@JoeKarlsson1",
"twitch": "joe_karlsson",
"tiktok": "joekarlsson",
"website": "joekarlsson.com",
"addresses": [
{ "street": "123 Sesame St", "city": "Anytown", "cc": "USA" },
{ "street": "123 Avenue Q", "city": "New York", "cc": "USA" }
]
}

One-to-Many:

Alright, let's say that you are building a product page for an e-commerce website, and you are going to have to design a schema that will be able to show product information. In our system, we save information about all the many parts that make up each product for repair services. How would you design a schema to save all this data, but still make your product page performant? You might want to consider a one-to-many schema since your one product is made up of many parts.

Product
{
"name": "left-handed smoke shifter",
"manufacturer": "Acme Corp",
"catalog_number": "1234",
"parts": ["ObjectID('AAAA')", "ObjectID('BBBB')", "ObjectID('CCCC')"]
}

Parts
{
"_id" : "ObjectID('AAAA')",
"partno" : "123-aff-456",
"name" : "#4 grommet",
"qty": "94",
"cost": "0.94",
"price":" 3.99"
}

One-to-Squillions:

What if we have a schema where there could be potentially millions of subdocuments, or more? That's when we get to the one-to-squillions schema. And, I know what you're thinking: Is squillions a real word?
And the answer is yes, it is a real word.
Let's imagine that you have been asked to create a server logging application. Each server could potentially save a massive amount of data, depending on how verbose you're logging and how long you store server logs for.

hosts
{
"_id": ObjectID("AAAB"),
"name": "goofy.example.com",
"ipaddr": "127.66.66.66"
}

Log messages
{
"time": ISODate("2014-03-28T09:42:41.382Z"),
"message": "cpu is on fire!",
"host": ObjectID("AAAB")
}

Many-to-Many:

The last schema design pattern we are going to be covering in this post is the many-to-many relationship. This is another very common schema pattern that we see all the time in relational and MongoDB schema designs. For this pattern, let's imagine that we are building a to-do application. In our app, a user may have many tasks and a task may have many users assigned to it.
todo_9ddb687d61
In order to preserve these relationships between users and tasks, there will need to be references from the one user to the many tasks and references from the one task to the many users. Let's look at how this could work for a to-do list application.

users:
{
"_id": ObjectID("AAF1"),
"name": "Kate Monster",
"tasks": [ObjectID("ADF9"), ObjectID("AE02"), ObjectID("AE73")]
}

tasks:
{
"_id": ObjectID("ADF9"),
"description": "Write blog post about MongoDB schema design",
"due_date": ISODate("2014-04-01"),
"owners": [ObjectID("AAF1"), ObjectID("BB3G")]
}

Optimising query patterns:

Optimizing your query patterns is crucial for reducing execution time and resource usage:

  1. Projection:
    Use projection to limit the fields returned by your queries, minimizing data transfer and processing load. Also, it’s better to exclude _id with 0 (false) if it’s not a field pertaining to the application — i.e., an auto-generated field by MongoDB. db.collection.find({ field: value }, { field1: 1, field2: 1 })

  2. Aggregation framework:
    Leverage MongoDB's
    aggregation framework for complex data processing. Ensure aggregations utilize indexed fields where possible.
    db.collection.aggregate([ { $match: { field: value } }, { $group: { _id: "$field", total: { $sum: "$amount" } } } ])

  3. Avoid $where:
    The $where operator can be slow and resource-intensive. Use it sparingly and only when necessary. Instead, the use of $expr with aggregation operators that do not use JavaScript (i.e., non-$function and non-$accumulator operators) is faster than $where because it does not execute JavaScript and is preferable, when possible. However, if you must create custom expressions, $function is preferred over $where.

Summary:

As you can see, there are a ton of different ways to express your schema design, by going beyond normalizing your data like you might be used to doing in SQL. By taking advantage of embedding data within a document or referencing documents using the $lookup operator, you can make some truly powerful, scalable, and efficient database queries that are completely unique to your application. In fact, we are only barely able to scratch the surface of all the ways that you could model your data in MongoDB.

💖 💪 🙅 🚩
tarikul01
Md.Tarikul

Posted on July 16, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related