Optimizing complex relational query in MongoDB using aggregation pipeline
Zhalok Rahman
Posted on April 28, 2023
we all know that MongoDB is a nosql database and is mostly used for non relational database.
But you can also design a database with strong relationship between tables (or models for mongo) using MongoDB as well. But what will be the problem then? The problem will be if we use the basic APIs of MongoDB like FindOne(), FindById() etc, then we may end up writing a lot of database calls on a single API endpoint.
For example, let's assume that we have a simple classroom management system where there will be a classroom and under one classroom there can be multiple teachers and students.
So, now if we want to design a database schema for this design what we need to do? Let's first understand the relation between classroom teachers and students. Hence the relation between classroom-teacher is many to many and so as the relation between classroom-student.
Now let's design a classroomSchema using mongoose. How we will do that?
A naive approach can be like this:
but whats the problem? this database is not normalized and may lead to performance issue while scaling up.
So what can we do?
We can come up with a bit efficient solution by storing the object id of the teachers and students documents instead of keeping the whole documents.
Great, but now if we need to build a rest api endpoint which will return all the classrooms along with all the teachers and students associated with them
we can write the following code for that:
but what's the problem with this solution? this solution will lead to a larger response time because there are 3 database calls here and the complexity of the logic being applied is O(n^2) (quadratic) which both are going to make latency in the response. To get out of this problem, we need to make only one database call and get all the necessary information that we need directly from the database query instead of writing any additional logic.
But how can we do that? we are not storing the full document in our array of the classroom document, then how can we do combine classroom, teacher and student document and then make query?
We can use aggregation pipeline. Aggregation pipeline allows you to make all kinds of query that you can perform in SQL with just some different syntax. I don't know why the mongoDB tutorials I found online did not include the concept of aggregation pipeline yet it is necessary to understand.
MongoDB aggregation pipeline allows you to write multiple queries on a single database call.
Right now our goal is to get all the classrooms along with their enrolled students and teachers. Which means we need to combine classroom, teacher and student schemas together and then run the find() query.
we can write the following code for that
we can see that the model.aggregate() function receives an array of objects. In each object we can write individual query. We used first $lookup query to join classroom with teachers schema and second $lookup query to join classroom with students schema.
Let's see these in action and observe the difference.
Let's first create three endpoints
one for creating teachers
one for creating students
one for creating classrooms with 3 teachers and 3 students
Each of these endpoints will create 1000 teacher and students and 100 classroom entries in the database
now we will build two endpoints to fetch the classrooms.
GET /classrooms/unoptimized
GET /classrooms/optimized
now lets make the unoptimized query
see the response time. its around 850 ms
Now lets see the optimized endpoint response time
Just see the magic
we have managed to reduce the response time from 850 ms to just 15 ms.
HOW COOL IS THAT
Posted on April 28, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.