Daniele
Posted on April 8, 2024
Introduction
From 2000 to 2010, I built backend solutions using SQL databases.
After nearly a decade with SQL, I transitioned to using MongoDB in 2010, and since then, I've designed many backend architectures with MongoDB as the primary database.
During this period, I consistently read books and engaged in discussions with software engineers to stay updated on MongoDB best practices.
However, the more expertise one gains in a technology, the more challenging it becomes to identify areas for improvement.
In an effort to improve my MongoDB knowledge, I attended MongoDB University in July 2022 and obtained my MongoDB Developer certification later that same year, in September. In this post, I'll share the benefits I've experienced from that certification over the past year.
Was it worth it?
It's been a year since I obtained my MongoDB Developer certification, and I've often been asked, "Was it worth it?"
Allow me to share what I've learned and how it has benefited me in my role as Tech Lead at GenieAI.
Product usage: Covered query
Problem:
GenieAI operates in the legal tech field, and our backend prioritizes security. This means that our security layers interact with MongoDB multiple times during a single transaction.
First solution:
Initially, we created an index on the userID to speed up db queries. However, MongoDB still required fetching data from the disk once it located the document.
Upgraded solution:
Through the MongoDB optimization course, I learned about "covered queries" as a method for MongoDB to retrieve records without accessing the disk. Consequently, we replaced the generic index with a "covered index."
Benefits:
This improvement resulted in queries being approximately 40% faster, and our entire cluster experienced a 20% reduction in CPU and disk usage. Consequently, we could execute our test suite faster.
Link:
https://www.mongodb.com/docs/manual/core/query-optimization/#std-label-indexes-covered-queries
Product usage: concise syntax for subqueries
Problem:
There may come a time when a $lookup
operation is necessary. Even if this is consider an anti-pattern, it might be fine if the query is rarely executed. Even though we strive to minimize the use of $lookup
in our database, there are situations where it simplifies the database design.
First solution:
Our $lookup
contained many $expr
filters. While $lookup
generally performs well with indexes, we observed that the performance of $lookup
queries containing the $expr
operator was not great.
Upgraded solution:
We replaced the $expr
operator with what is known as "concise syntax" which relies on localField
and foreignField
instead.
Benefits:
Queries utilizing the concise syntax demonstrated slightly improved performance, particularly when combined with the covered indexes described earlier.
Link:
https://www.mongodb.com/docs/manual/reference/operator/aggregation/lookup/#correlated-subqueries-using-concise-syntax
Product usage: Capped collections
Problem:
Our backend architecture includes a built-in monitoring system that captures "I/O" messages. Why do we store both IN and OUT messages? In short, for analytics, debugging, and monitoring purposes. However, due to our system's high throughput of 500 messages per second, and sometimes even more, this approach consumes significant database DB resources.
Solution:
Initially, we stored the I/O messages in a regular MongoDB collection, utilizing a TTL index to purge these messages after a certain number of days.
Upgraded solution:
We migrated our regular collection into a capped collection. This type of collection maintains a fixed size and supports high-throughput operations.
Benefits:
The switch to a capped collection resulted in higher and more consistent performance:
- Higher: The collection's performance significantly increased due to its optimized structure.
- Consistent: Capped collections deliver stable performance regardless of workload.
Link:
https://www.mongodb.com/docs/manual/core/capped-collections
Database design patterns
The MongoDB design patterns courses is one of my favorites. Why?
Because when it comes to database design, it's often easy to create a MongoDB collection, but realizing later on that we've made incorrect design decisions can be a costly mistake.
Have you ever encountered one of these situations?
- You were uncertain if the database schema was the right fit.
- Your team members held radically different views on schema design.
- You discovered that a prematurely optimized database schema led to premature scaling issues.
If you answered yes to at least one of these scenarios, you'll find MongoDB design patterns courses invaluable regardless of your level of experience with MongoDB.
Database design patterns: many-to-many
Problem:
Our system manages legal contract clauses, and each clause may have one or more associated problems. Conversely, the same problem might be linked to one or more clauses.
Solution:
Initially, we designed a collection named legal_contracts
, containing an array of Legal Problems
. Each problem was uniquely identified by an ID
, which facilitated data navigation through Aggregation based on either the problem or the legal contract itself. We opted for this approach because it aligns more closely with business terminology, leading us to believe it would be easier to maintain. However, we found that running Aggregation pipelines was too CPU-intensive. Given that our system experienced more reads than writes, this approach turned out to be inefficient.
Upgraded solution:
To address this, we introduced two separate collections: legal_contract
and legal_problems
. Depending on the user action, we traverse the data in one direction (from Legal Contract to Problems) or the other (from Problems to Legal Contract). By eliminating the need for Aggregation pipelines, we significantly improved system responsiveness.
We leveraged the collection of design patterns and anti-patterns from MongoDB University to implement these enhancements. The results surpassed our initial expectations!
Link:
https://learn.mongodb.com/courses/schema-design-patterns
Database schema design: the Agile way
Problem:
One common pitfall in MongoDB schema design is delaying consideration of the workload until it's too late. Often, teams focus excessively on achieving 'the cleanest solution' or 'the easiest to maintain.' This has been one of the most difficult problems to manage among engineers, and a common response I've received is, 'I just prefer to focus on ....'.
Problem:
Different teams within your company may adopt different schema design approaches. You might have one team prioritizing performance while another emphasizes schema readability. This diversity can result in numerous database schema styles, making it challenging for TeamMemberA to access data produced by TeamMemberB. TeamMemberA may request documentation from TeamMemberB, but doesn't Agile promote working software over documentation? Indeed, it does. So how can MongoDB be used in an Agile manner across our organisation?
Solution:
Historically, the team I've worked with prioritized high-performance database schemas as the preferred solution. However, the interpretation of "performance" can vary (faster reads or writes? Ease of querying? Ease of upgrading?).
Upgraded solution:
MongoDB University offers an excellent schema design approach summarized in this image:
By adopting this approach, GenieAI achieved predictable and consistent schema design decisions across teams, with minimal or no documentation required, and an easily upgradable model.
Conclusions
We launched our product in October 2023, and our app was featured as the Product of the Day on ProductHunt (link)
Thankfully, all the improvements had already been implemented and deployed online, alongside many others not covered in this post.
Despite facing a massive workload for days, GenieAI's backend performed exceptionally well, maintaining consistent performance even during this critical phase.
The MongoDB Certification immediately provided benefits for all the features we were developing. More importantly, it has also generated long-term value in our codebase, which is now more consistent and delivers more predictable results.
What an incredible journey!
Was this post helpful to you?
Let me know in the comments.
Posted on April 8, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.