Bulk Loading Data In PostgreSQL With Node.js and Sequelize

bretthoyer

Brett Hoyer

Posted on December 27, 2022

Bulk Loading Data In PostgreSQL With Node.js and Sequelize

Application development often requires seeding data in a database for testing and development. The following article will outline how to handle this using Node.js and Sequelize.

Whether you're building an application from scratch with zero users, or adding features to an existing application, working with data during development is a necessity. This can take different forms, from mock data APIs reading data files in development, to seeded database deployments closely mirroring an expected production environment.

I prefer the latter as I find fewer deviations from my production toolset leads to fewer bugs.

A Humble Beginning

For the sake of this discussion, let's assume we're building an online learning platform offering various coding courses. In its simplest form, our Node.js API layer might look like this.

// server.js

const express = require("express");
const App = express();

const courses = [
   {title: "CSS Fundamentals", "thumbnail": "https://fake-url.com/css"}],
   {title: "JavaScript Basics", "thumbnail": "https://fake-url.com/js-basics"}],
   {title: "Intermediate JavaScript", "thumbnail": "https://fake-url.com/intermediate-js"}
];

App.get("/courses", (req, res) => {
   res.json({data: courses});
});

App.listen(3000);
Enter fullscreen mode Exit fullscreen mode

If all you need is a few items to start building your UI, this is enough to get going. Making a call to our /courses endpoint will return all of the courses defined in this file. However, what if we want to begin testing with a dataset more representative of a full-fledged database-backed application?

Working With JSON

Suppose we inherited a script exporting a JSON-array containing thousands of courses. We could import the data, like so.

// courses.js

module.exports = [
   {title: "CSS Fundamentals", "thumbnail": "https://fake-url.com/css"}],
   {title: "JavaScript Basics", "thumbnail": "https://fake-url.com/js-basics"}],
   {title: "Intermediate JavaScript", "thumbnail": "https://fake-url.com/intermediate-js"},
   ...
];

// server.js

...
const courses = require("/path/to/courses.js");
...
Enter fullscreen mode Exit fullscreen mode

This eliminates the need to define our mock data within our server file, and now we have plenty of data to work with. We could enhance our endpoint by adding parameters to paginate the results and set limits on how many records are returned. But, what about allowing users to post their own courses? How about editing courses?

This solution gets out of hand quickly as you begin to add functionality. We'll have to write additional code to simulate features of a relational database. After all, databases were created to store data. So, let's do that.

Bulk Loading JSON With Sequelize

For an application of this nature, PostgreSQL is an appropriate database selection. We have the option of running PostgreSQL locally, or connecting to a PostgreSQL-compatible cloud native database, like YugabyteDB Managed. Apart from being a highly-performant distributed SQL database, developers using YugabyteDB benefit from a cluster that can be shared by multiple users. As the application grows, our data layer can scale out to multiple nodes and regions.

After creating a YugabyteDB Managed account and spinning up a free database cluster, we're ready to seed our database and refactor our code, using Sequelize. The Sequelize ORM allows us to model our data to create database tables and execute commands. Here's how that works.

First, we install Sequelize from our terminal.

// terminal
> npm i sequelize
Enter fullscreen mode Exit fullscreen mode

Next, we use Sequelize to establish a connection to our database, create a table, and seed our table with data.

// database.js

// JSON-array of courses
const courses = require("/path/to/courses.js");

// Certificate file downloaded from YugabyteDB Managed
const cert = fs.readFileSync(CERTIFICATE_PATH).toString();

// Create a Sequelize instance with our database connection details
const Sequelize = require("sequelize");
const sequelize = new Sequelize("yugabyte", "admin", DB_PASSWORD, {
   host: DB_HOST,
   port: "5433",
   dialect: "postgres",
   dialectOptions: {
   ssl: {
       require: true,
       rejectUnauthorized: true,
       ca: cert,
   },
   },
   pool: {
   max: 5,
   min: 1,
   acquire: 30000,
   idle: 10000,
   }
});

// Defining our Course model
export const Course = sequelize.define(
   "course",
   {
       id: {
           type: DataTypes.INTEGER,
           autoIncrement: true,
           primaryKey: true,
       },
       title: {
           type: DataTypes.STRING,
       },

       thumbnail: {
           type: DataTypes.STRING,
       },
   }
);


async function seedDatabase() {
   try {
       // Verify that database connection is valid
       await sequelize.authenticate();

       // Create database tables based on the models we've defined
       // Drops existing tables if there are any
       await sequelize.sync({ force: true });

       // Creates course records in bulk from our JSON-array
       await Course.bulkCreate(courses);

       console.log("Courses created successfully!");
   } catch(e) {
       console.log(`Error in seeding database with courses: ${e}`);
   }
}

// Running our seeding function
seedDatabase();
Enter fullscreen mode Exit fullscreen mode

By leveraging Sequelize’s bulkCreate method, we’re able to insert multiple records in one statement. This is more performant than inserting requests one at a time, like this.

. . .
// JSON-array of courses
const courses = require("/path/to/courses.js");

async function insertCourses(){
    for(let i = 0; i < courses.length; i++) {
    await Course.create(courses[i]); 
}
}

insertCourses();
Enter fullscreen mode Exit fullscreen mode

Individual inserts come with the overhead of connecting, sending requests, parsing requests, indexing, closing connections, etc. on a one-off basis. Of course, some of these concerns are mitigated by connection pooling, but generally speaking the performance benefits of inserting in bulk are immense, not to mention far more convenient. The bulkCreate method even comes with a benchmarking option to pass query execution times to your logging functions, should performance be of primary concern.

Now that our database is seeded with records, our API layer can use this Sequelize model to query the database and return courses.

// server.js

const express = require("express");
const App = express();

// Course model exported from database.js
const { Course } = require("/path/to/database.js")

App.get("/courses", async (req, res) => {
   try {
       const courses = await Course.findAll();
       res.json({data: courses});
   } catch(e) {
       console.log(`Error in courses endpoint: ${e}`);
   }
});
App.listen(3000);
Enter fullscreen mode Exit fullscreen mode

Well, that was easy! We've moved from a static data structure to a fully-functioned database in no time.

What if we're provided the dataset in another data format, say, a CSV file exported from Microsoft Excel? How can we use it to seed our database?

Working With CSVs

There are many NPM packages to convert CSV files to JSON, but none are quite as easy to use as csvtojson. Start by installing the package.

// terminal
> npm i csvtojson
Enter fullscreen mode Exit fullscreen mode

Next, we use this package to convert our CSV file to a JSON-array, which can be used by Sequelize.

// courses.csv
title,thumbnail
CSS Fundamentals,https://fake-url.com/css
JavaScript Basics,https://fake-url.com/js-basics
Intermediate JavaScript,https://fake-url.com/intermediate-js
Enter fullscreen mode Exit fullscreen mode
// database.js
...
const csv = require('csvtojson');
const csvFilePath = "/path/to/courses.csv";

// JSON-array of courses from CSV
const courses = await csv().fromFile(csvFilePath);
...
await Course.bulkCreate(courses);
...
Enter fullscreen mode Exit fullscreen mode

Just as with our well-formatted courses.js file, we're able to easily convert our courses.csv file to bulk insert records via Sequelize.

Conclusion

Developing applications with hardcoded data can only take us so far. I find that investing in tooling early in the development process sets me on the path towards bug-free coding (or so I hope!)

By bulk loading records, we’re able to work with a representative dataset, in a representative application environment. As I’m sure many agree, that’s often a major bottleneck in the application development process.

Give Sequelize and YugabyteDB a try in your next Node.js coding adventure!

💖 💪 🙅 🚩
bretthoyer
Brett Hoyer

Posted on December 27, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related