Efficient batch processing for event-driven chunking

seongjin605

Jin.Park

Posted on March 3, 2024

Efficient batch processing for event-driven chunking

🏛️ Arehs

The arehs ensures the best possible large batch processing, which is oriented towards event-driven chunk
processing.

It does this by immediately allocating the next asynchronous task call for dense packing, rather than waiting for the
first asynchronous task call to complete.

Also, if your database has a pool of 50 connections and you need to run 1,000 queries, you might need to create a Promise array of 50 or fewer chunks and do Promise.all on those chunks, which is typical.
However, Promise.all might not be enough, so you might want to consider Arehs, which is a good option for chunk processing.

In that way we can achieve multiple things:

  • Control the throughput of our service by setting the concurrency of the Promise Pool.
  • Manage load on the downstream services by setting the concurrency of the Promise Pool.
  • Increase the performance of our application
  • Reduced CPU idle time, etc.

📚 Getting Started

arehs supports both CommonJS and ES Modules.

CommonJS

const { Arehs } = require("arehs");
Enter fullscreen mode Exit fullscreen mode

ES Modules

import { Arehs } from "arehs";
Enter fullscreen mode Exit fullscreen mode

Example

  • create: The purpose of the create method is to create an Arehs instance from a specific array of data.
  • withConcurrency: Methods that set the value for parallelism and return the current instance.(default: 10)
  • timeoutLimit: The default value is 0. If it's greater than 0, the option works, and an error is thrown if the operation takes longer than the timeout time(ms).
  • stopOnFailure: If the stopOnFailure option is set to true, the function stops processing and emits appropriate events.
  • retryLimit: Set a limit on the number of retries on failure.
  • mapAsync: Calling the mapAsync function starts the process of asynchronously processing the input data and returning the results. If the stopOnFailure option is set to true, the function stops processing and emits appropriate events. This can be useful for handling transient errors or ensuring data processing resilience. Also, if the retryLimit option is greater than 0, you can set a limit on the number of retries on failure.
import { Arehs } from "arehs";

const dataArr = [
  { id: 1, name: "John" },
  { id: 2, name: "Alice" },
  { id: 3, name: "Bob" }
];

const result = await Arehs.create(dataArr)
  .withConcurrency(10)
  .mapAsync(async data => {
    return await someAsyncFunction(data);
  });
Enter fullscreen mode Exit fullscreen mode

⚡️ Performance

Tests have shown that Arehs can be improved by about 30% over Promise.all.

import { Arehs } from "arehs";

const delay = (i) => {
  return new Promise((res, rej) => {
    setTimeout(() => {
      res(i);
    }, 150 + Math.random() * 1000);
  });
};

(async () => {
  const tasks = Array.from({ length: 1000 }).map((d, i) => i);

  const startArehs = performance.now();
  await Arehs.create(tasks).withConcurrency(50).mapAsync(delay);
  const endArehs = performance.now();

  console.log(`Arehs: ${endArehs - startArehs}ms`);

  const startPromiseAll = performance.now();
  while (tasks.length > 0) {
    const chunkedTasks = tasks.splice(0, 50);
    await Promise.all(chunkedTasks.map(delay));
  }
  const endPromiseAll = performance.now();

  console.log(`Promise.all: ${endPromiseAll - startPromiseAll}ms`);
})();
Enter fullscreen mode Exit fullscreen mode
    promiseAllTime: 19.859867874979972(s)
    promisePoolTime: 13.55725229203701(s)
Enter fullscreen mode Exit fullscreen mode

Promise.all

As you can see, Promise.all runs as long as the slowest promise in the batch.

So your main thread is basically “doing nothing” and is waiting for the slowest request to finish.

The longest promise in the Promise array, number 4, will be the chunk's execution time.

This creates an inefficient problem where the next promises don't do any work until the longest promise is finished.


Code Crafters Logo

Arehs

Arehs is all about making the most of Node.js's main thread by running the Promise Pool Pattern.

To achieve better utilization we need densely pack the API calls (or any other async task) so that we do not wait while
the most extended call completes, rather we schedule the next call as soon as the first one finishes.


Code Crafters Logo

🙋‍♀️FAQ

Is this always better than Promise.all?

No, there is No silver bullet.

This can increase your application's performance when you're making a lot of API calls and asynchronous operations.

Also, it may not make much difference in situations where each promise has roughly the same work time.

If you can't get any further performance improvement with Promise.all in your environment,

you can give it a try, but if you can get by with Promise.all, you don't have to.

Therefore, you should try to use Arehs in your projects that need performance improvements only after thoroughly
testing it.

It will help you. Thank you.

💖 💪 🙅 🚩
seongjin605
Jin.Park

Posted on March 3, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related