How Database Indexes Affect MongoDB and Application Performance

dbazhenov

Daniil Bazhenov

Posted on May 4, 2023

How Database Indexes Affect MongoDB and Application Performance

I will show with real examples and graphs how indexes can affect the performance of the database and your application.

This article was inspired by a lot of feedback from my friends saying that MongoDB is slow or crashing. Indeed, when I started working with MongoDB, I noticed very quickly that it starts working slowly, although there were few documents in the database. MySQL was faster in the same situation during development with no load. I just didn't add any indexes.

As an experiment, I will run an application that will make lots of queries to different collections in the database. In the experiment, I will show that it can run fast and slow and even crash entirely because of our mistakes, just like any tool. At the beginning of the experiment, all queries are executed using indexes. Then I'll remove the index for a while, show how it affects it. And I'll create it again to restore the performance.

About the application and the database

The application is developed in PHP 8 and uses the default MongoDB driver for database queries.

The application is deployed on a t2.micro (Free Tier) instance in AWS, with 1 vCPU and 1 GiB of Memory.

The application makes queries to several collections in a loop:

  • Pages - 3.2 million documents
  • Users - 130k documents
  • Docs - 100k documents

Percona Server for MongoDB is used as the database. MongoDB ReplicaSet consists of three nodes also deployed in AWS, each node on a separate t2.micro instance.

I use the free, open-source tool Percona Monitoring and Management (PMM) for monitoring and graphing.

These limited resources are available to everyone for free, and you will see how much you can get out of them.

Instances in AWS

Let's start the experiment

I ran the application to do the load on the database. All queries in this experiment were performed to the Primary node to simplify the demonstration. And also in the final section of the article I will make an experiment with the load on all three nodes.

The loop performed several FindOne queries

  1. Get a document from a collection sorted by timestamp
  2. Get an random document by id.

Queries were run against two different collections (with 3 million and 100k documents)

Example code

   $last_user = $app['db']->users->findOne([],
   [
       'sort' => [
           'timestamp' => -1
       ]
   ]);

   $user_id = rand(1, $last_user['user_id']);

   $user_data = $app['db']->users->findOne(
   [
       'user_id' => $user_id
   ]);
Enter fullscreen mode Exit fullscreen mode

When I ran one process, I got about 1,000 queries per second and 30% of the CPU load of the database instance.

Then I started the second similar process. I got about 1.8k queries per second and just over 50% CPU load.

Running processes with database queries

CPU load with two processes

I was shocked by these results because these are very big numbers. For example, if you develop a website, you can get hundreds of requests per second (RPS) from online users on these resources and store millions of rows (documents) in a database with a disk of less than 10GB.

Queries to the database were very fast. About 2-100 ms to a collection with 100k documents and 3-300 ms to a table with 3 million documents. Not enough resources for 3 million documents, but it kept running at a speed acceptable to a live user.

Queries in progress and execution time

Let's delete the index in the collection with 100k documents

I opened MongoDB Compass and removed the index on the timestamp field. One of the queries was sorting the collection by this field.

Performance has degraded dramatically:

  1. The number of queries has dropped from 1.8k to 120 per second.
  2. The load on the CPU has increased to 90%, to the limit.
  3. The query time went from 100ms to 20+ seconds on average.

Yes, the app continued to work. If you don't have many users while the app is in development, you probably won't even notice it. But the app was very bad.

Deleting and restoring MongoDB index - Operations

Deleting and restoring MongoDB index - CPU

Deleting and restoring MongoDB index - QAN

Always create indexes for all fields that are used in search and sorting.

Let's create the index

I just created an index that I deleted earlier.

Performance, CPU utilization, and execution time were immediately restored.

MongoDB Compass - Crating an index

Now delete the index in the collection of 3 million documents

Performance has dropped to less than one operation per second.

The Primary node died after about 30 seconds. Then I lost the connection to the second Secondary node, after which the PHP application ended with an error.
It took me about 10 minutes to recover the instances, and I didn't repeat the experiment.

It is gratifying that simple Reboot instances through AWS control panel automatically started all database nodes; they connected to the monitoring and continued to work.

Conclusion one

Always add indexes.

To learn more about them, I recommend taking the excellent MongoDB performance course from MongoDB:
M201: MongoDB Performance

Let's try to use all the ReplicaSet nodes

In the experiment, I only queried the Primary node, but I have a ReplicaSet with three nodes.

I set Read Preference for read operations from Secondary Nodes.

I just used the parameter when initializing the database client in my application

 'readPreference' => 'secondaryPreferred'
Enter fullscreen mode Exit fullscreen mode

As a result, the initialization looks like this

$app['db_client'] = new \MongoDB\Client(MONGODB_URLS,
   [
       'username' => MONGODB_USER,
       'password' => MONGODB_PASSWORD,
       'replicaSet' => 'MongoDB-RS',
       'authSource' => 'admin',
       'readPreference' => 'secondaryPreferred'
   ],
);
Enter fullscreen mode Exit fullscreen mode

As a result, all write operations are automatically executed in Primary, and read operations in Secondary nodes.

So, without increasing the number of resources, using t2.micro instances I got:

  1. 1k ops/sec read operations on each node;
  2. over 500 ops/s of writes and updates (insert, update, delete).

The CPUs of each instance were less than 50% loaded, and I just didn't need more performance for my application.

MongoDB Read Preference on three nodes

I repeated the experiment with deleting the index in the collection where the write was done. I had another Docs collection with 100k+ documents. Then I restored the index.

And also got a lot of performance degradation.

Then I loaded the Primary node a little bit more.

Load on Primary node

Conclusion two

Think about performance, experiment, and keep an eye on monitoring.

It's not complicated or time-consuming, and various free tools are available now.

💖 💪 🙅 🚩
dbazhenov
Daniil Bazhenov

Posted on May 4, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related