Mastering MongoDB and Elasticsearch Integration: A Practical Guide for Node.js Developers

Introduction

In the modern web development landscape, the ability to search and access data quickly can be the key differentiator between a standard application and one that truly stands out. Imagine an online store where users can find products in milliseconds, receiving precise suggestions as they type. This enhanced user experience is made possible by technologies like MongoDB and Elasticsearch.

In this article, we’ll explore the importance of these technologies and how to integrate them effectively. For developers already familiar with Node.js, understanding how Elasticsearch can accelerate data searches and provide a more responsive experience is a significant advantage.

Why MongoDB and Elasticsearch?

MongoDB is a popular choice among developers who need a flexible and scalable NoSQL database. However, when it comes to complex, high-performance searches, Elasticsearch becomes the ideal partner. With its ability to index and search large volumes of data in real-time, Elasticsearch offers a powerful solution to improve the end-user experience.

By integrating MongoDB with Elasticsearch, you essentially combine the best of both worlds: MongoDB’s flexibility and scalability with Elasticsearch’s speed and search efficiency.

Common Integration Challenges

Before we dive into the technical implementation, it’s important to highlight the challenges you may face when integrating MongoDB and Elasticsearch. Two of the biggest hurdles are:

Data Mapping: Since MongoDB and Elasticsearch have different data structures, ensuring that MongoDB data is correctly mapped to Elasticsearch is crucial.
Data Synchronization: Keeping data synchronized between MongoDB and Elasticsearch can be tricky, especially when dealing with large volumes of real-time data.

Setting Up the Environment with Docker

To ensure that your application runs seamlessly, we need to set up the environment using Docker. This setup includes custom Dockerfile configurations for each service to address specific requirements, such as installing plugins or configuring the environment.

Docker Compose Configuration

Here is the docker-compose.yml file that defines the required services: MongoDB, Elasticsearch, Logstash, Kibana, and the Node.js application.

version: '3.8'

services:
  mongodb:
    build: ./mongodb
    container_name: mongodb
    ports:
      - "27017:27017"
    environment:
      MONGO_INITDB_DATABASE: ${MONGO_DATABASE}
    volumes:
      - mongodb_data:/data/db

  elasticsearch:
    build: ./elasticsearch
    container_name: elasticsearch
    environment:
      - discovery.type=single-node
      - ES_JAVA_OPTS=-Xms512m -Xmx512m
    ports:
      - "9200:9200"
    volumes:
      - esdata:/usr/share/elasticsearch/data

  logstash:
    build: ./logstash
    container_name: logstash
    ports:
      - "5000:5000"
    environment:
      LOGSTASH_JAVA_OPTS: "-Xmx256m -Xms256m"
    volumes:
      - ./logstash/logstash.conf:/usr/share/logstash/pipeline/logstash.conf

  kibana:
    build: ./kibana
    container_name: kibana
    ports:
      - "5601:5601"
    depends_on:
      - elasticsearch

  app:
    build: ./app
    container_name: node_app
    ports:
      - "3333:3333"
    depends_on:
      - mongodb
      - elasticsearch
      - logstash
    environment:
      - MONGO_URI=${MONGO_URI}
      - ELASTIC_URI=${ELASTIC_URI}
    volumes:
      - .:/usr/src/app

volumes:
  mongodb_data:
  esdata:

Dockerfile Configurations

Each service requires a specific Dockerfile to ensure proper configuration.

MongoDB (mongodb/Dockerfile):

   FROM mongo:4.4
   COPY init-mongo.js /docker-entrypoint-initdb.d/

The init-mongo.js script initializes the MongoDB database with some sample data.

Elasticsearch (elasticsearch/Dockerfile):

   FROM docker.elastic.co/elasticsearch/elasticsearch:7.14.0
   COPY elasticsearch.yml /usr/share/elasticsearch/config/

The elasticsearch.yml file contains specific Elasticsearch configurations.

Logstash (logstash/Dockerfile):

   FROM docker.elastic.co/logstash/logstash:7.14.0
   RUN logstash-plugin install logstash-input-mongodb
   COPY logstash.conf /usr/share/logstash/pipeline/

This Dockerfile installs the MongoDB input plugin for Logstash and copies the Logstash configuration file.

Kibana (kibana/Dockerfile):

   FROM docker.elastic.co/kibana/kibana:7.14.0
   COPY kibana.yml /usr/share/kibana/config/

The kibana.yml file configures Kibana settings.

Node.js Application (app/Dockerfile):

   FROM node:20
   WORKDIR /usr/src/app
   COPY package*.json ./
   RUN npm install
   COPY . .
   EXPOSE 3333
   CMD ["node", "index.js"]

This Dockerfile sets up the Node.js application, installs dependencies, and runs the app.

Supporting Configuration Files

Elasticsearch Configuration (elasticsearch/elasticsearch.yml):

   cluster.name: "docker-cluster"
   network.host: 0.0.0.0
   http.port: 9200

Logstash Configuration (logstash/logstash.conf):

   input {
     mongodb {
       uri => "mongodb://mongodb:27017/productdb"
       placeholder_db_dir => "/usr/share/logstash/pipeline/"
       placeholder_db_name => "logstash_sqlite.db"
       collection => "products"
       batch_size => 5000
     }
   }

   output {
     elasticsearch {
       hosts => ["http://elasticsearch:9200"]
       index => "products"
     }
   }

This configuration file connects Logstash to MongoDB, extracts data, and outputs it to Elasticsearch.

Kibana Configuration (kibana/kibana.yml):

   server.name: kibana
   server.host: "0"
   elasticsearch.hosts: ["http://elasticsearch:9200"]

MongoDB Initialization Script (mongodb/init-mongo.js):

   db = db.getSiblingDB("productdb");

   db.products.insertMany([
     { name: "Smartphone X", price: 999.99 },
     { name: "Laptop Pro", price: 1499.99 },
     { name: "Wireless Earbuds", price: 129.99 }
   ]);

This script initializes the MongoDB database with a sample collection of products.

.env File

The .env file stores environment variables to simplify configuration and maintenance.

MONGO_DATABASE=productdb
MONGO_URI=mongodb://mongodb:27017/productdb
ELASTIC_URI=http://elasticsearch:9200

Technical Implementation: How to Integrate MongoDB and Elasticsearch

Now that we understand the importance and challenges, let’s get our hands dirty. Below, you’ll find a detailed example of how to perform this integration.

Step 1: Initial Setup

First, we need to set up our environment with the necessary dependencies. We’ll use Node.js to mediate communication between MongoDB and Elasticsearch.

npm install mongodb @elastic/elasticsearch express cors

Step 2: Connecting to MongoDB and Elasticsearch

Next, we’ll configure our application to connect to MongoDB and Elasticsearch:

const express = require('express');
const { Client } = require("@elastic/elasticsearch");
const { MongoClient } = require("mongodb");
const cors = require('cors');

const app = express();
const port = 3333;

const esClient = new Client({ node: process.env.ELASTIC_URI });
const mongoClient = new MongoClient(process.env.MONGO_URI);

app.use(express.json());
app.use(cors());

Step 3: Data Mapping and Synchronization

Now, for the most critical part: ensuring that MongoDB data is correctly mapped and synchronized with Elasticsearch. Here’s the code that handles this task:

async function syncData() {
  try {
    await mongoClient.connect();
    const db = mongoClient.db();
    const collection = db.collection('products');

    const products = await collection.find({}).toArray();

    // Rename _id to id and remove the original _id
    const body = products.flatMap(product => {
      const { _id, ...rest } = product;
      return [
        {
          index: {
            _index: 'products',
            _id: _id.toString(), // Use the original MongoDB _id as the document ID in Elasticsearch
          }
        },
        {
          id: _id.toString(), // Add _id as id in the document body
          ...rest // Include the rest of the document fields
        }
      ];
    });

    const bulkResponse = await esClient.bulk({ refresh: true, body });

    if (bulkResponse.errors) {
      const erroredDocuments =

 [];
      bulkResponse.items.forEach((action, i) => {
        const operation = Object.keys(action)[0];
        if (action[operation].error) {
          erroredDocuments.push({
            status: action[operation].status,
            error: action[operation].error,
            operation: body[i * 2],
            document: body[i * 2 + 1]
          });
        }
      });
      console.error('Failed to index the following documents:', erroredDocuments);
    } else {
      console.log(`Successfully indexed ${products.length} documents`);
    }

  } catch (error) {
    console.error('Error syncing data:', error);
  }
}

Step 4: Fast Data Retrieval with Elasticsearch

With the data synchronized, we can now leverage Elasticsearch’s speed to perform efficient searches:

app.get('/search', async (req, res) => {
  const { query } = req.query;

  if (!query || query.trim() === '') {
    return res.status(400).json({ error: 'The query parameter is required and cannot be empty.' });
  }

  try {
    const result = await esClient.search({
      index: 'products',
      body: {
        query: {
          multi_match: {
            query,
            fields: ['name^3', 'description']
          }
        }
      }
    });

    res.json(result.hits.hits);
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
});

Future Enhancements

While this solution works well, there’s always room for improvement:

Continuous Monitoring: Implement monitoring to ensure data in Elasticsearch and MongoDB remain synchronized.
Automated Mapping: Consider creating scripts to automate the mapping process, especially if MongoDB data changes frequently.
Scalability: As data volume increases, explore advanced partitioning and scalability techniques in both MongoDB and Elasticsearch.

Conclusion

Integrating MongoDB with Elasticsearch may seem challenging, but with the right approach, you can create fast, responsive applications that offer a superior user experience. Furthermore, mastering this technique can be a significant differentiator in your portfolio as a Node.js developer.

Blog