How Metis Optimized Queries Executed by Sequelize - A Case Study

adammetis

Adam Furmanek

Posted on July 12, 2023

How Metis Optimized Queries Executed by Sequelize - A Case Study

Sequelize is an ORM for Node.js and TypeScript. It can connect to PostgreSQL and other SQL engines. It supports transactions, relations, eager loading, lazy loading, functions, and other common operations.

We are going to use Sequelize to execute the same queries as in the other article about Metis. I highly recommend reading that part before moving on.

Data model

First thing we need to do is to model the database. I’m using the following table definitions:

TitleBasic.init(
{
  tconst: { type: DataTypes.STRING, primaryKey: true, allowNull: false },
  titletype: { type: DataTypes.TEXT },
  primarytitle: { type: DataTypes.TEXT },
  originaltitle: { type: DataTypes.TEXT },
  isadult: { type: DataTypes.BOOLEAN },
  startyear: { type: DataTypes.INTEGER },
  endyear: { type: DataTypes.INTEGER },
  runtimeminutes: { type: DataTypes.INTEGER },
  genres: { type: DataTypes.TEXT }
},
{
  sequelize,
  modelName: 'TitleBasic',
  tableName: 'title_basics',
  schema: 'imdb',
  timestamps: false,
}
);


TitleRating.init(
{
  tconst: { type: DataTypes.STRING, primaryKey: true, allowNull: false },
  averagerating: { type: DataTypes.DECIMAL},
  numvotes: DataTypes.INTEGER,
},
{
  sequelize,
  modelName: 'TitleRating',
  tableName: 'title_ratings',
  schema: 'imdb',
  timestamps: false,
}
);

TitlePrincipal.init(
{
  tconst: { type: DataTypes.STRING, primaryKey: true, allowNull: false },
  ordering: { type: DataTypes.INTEGER, primaryKey: true, allowNull: false },
  nconst: { type: DataTypes.TEXT },
  category: { type: DataTypes.TEXT },
  job: { type: DataTypes.TEXT },
  characters: { type: DataTypes.TEXT }
},
{
  sequelize,
  modelName: 'TitlePrincipal',
  tableName: 'title_principals',
  schema: 'imdb',
  timestamps: false,
}
);


TitleCrew.init(
{
  tconst: { type: DataTypes.STRING, primaryKey: true, allowNull: false },
  directors: { type: DataTypes.TEXT },
  writers: { type: DataTypes.TEXT }
},
{
  sequelize,
  modelName: 'TitleCrew',
  tableName: 'title_crew',
  schema: 'imdb',
  timestamps: false,
}
);


NameBasic.init(
{
  nconst: { type: DataTypes.STRING, primaryKey: true, allowNull: false },
  primaryname: { type: DataTypes.TEXT },
  birthyear: { type: DataTypes.INTEGER },
  deathyear: { type: DataTypes.INTEGER },
  primaryprofession: { type: DataTypes.TEXT },
  knownfortitles: { type: DataTypes.TEXT }
},
{
  sequelize,
  modelName: 'NameBasic',
  tableName: 'name_basics',
  schema: 'imdb',
  timestamps: false,
}
);
Enter fullscreen mode Exit fullscreen mode

We can see that most of the tables are configured to represent SQL tables directly with no modifications. I’m also using the following associations:

db.TitleBasic.hasMany(db.TitlePrincipal, {
  foreignKey: 'tconst',
  targetKey: 'tconst',
  as: 'titleBasicTitlePrincipal',
});

db.TitleBasic.hasMany(db.TitlePrincipal, {
  foreignKey: 'tconst',
  targetKey: 'tconst',
  as: 'titleBasicTitlePrincipal2',
});


db.TitlePrincipal.belongsTo(db.TitleBasic, {
  foreignKey: 'tconst',
  targetKey: 'tconst'
});


db.TitlePrincipal.hasOne(db.NameBasic, {
  foreignKey: 'nconst',
  targetKey: 'nconst',
  sourceKey: 'nconst'
});


db.NameBasic.belongsTo(db.TitlePrincipal, {
  foreignKey: 'nconst',
  targetKey: 'nconst',
  sourceKey: 'nconst'
});


db.TitleBasic.hasOne(db.TitleRating, {
  foreignKey: 'tconst',
  targetKey: 'tconst'
});


db.TitleRating.belongsTo(db.TitleBasic, {
  foreignKey: 'tconst',
  targetKey: 'tconst'
});


db.TitleBasic.hasOne(db.TitleCrew, {
  foreignKey: 'tconst',
  targetKey: 'tconst',
});


db.TitleCrew.belongsTo(db.TitleBasic, {
  foreignKey: 'tconst',
  targetKey: 'tconst'
});
Enter fullscreen mode Exit fullscreen mode

It’s worth noticing that I configured TitleBasics to TitlePrincipals association twice. That’s because one of my queries needs to join TitlePrincipals twice and Sequelize can’t do that just like that.

For a given actor, find their latest movies

Let’s start with our first query. We can implement the application code for getting latest movies for an actor:

return titleBasic
        .findAll({
          include: [{
            model: titlePrincipal,
            required: true,
            as: 'titleBasicTitlePrincipal',
            where: {
              'nconst': nconst
            },
          }],
          order: [
            ['startyear', 'DESC']
          ],
          limit: 10
        });
Enter fullscreen mode Exit fullscreen mode

This is the SQL code that has been generated:

SELECT
  TitleBasic.*,
  titleBasicTitlePrincipal.tconst AS titleBasicTitlePrincipal.tconst,
  titleBasicTitlePrincipal.ordering AS titleBasicTitlePrincipal.ordering,
  titleBasicTitlePrincipal.nconst AS titleBasicTitlePrincipal.nconst,
  titleBasicTitlePrincipal.category AS titleBasicTitlePrincipal.category,
  titleBasicTitlePrincipal.job AS titleBasicTitlePrincipal.job,
  titleBasicTitlePrincipal.characters AS titleBasicTitlePrincipal.characters
FROM
  (
    SELECT
      TitleBasic.tconst,
      TitleBasic.titletype,
      TitleBasic.primarytitle,
      TitleBasic.originaltitle,
      TitleBasic.isadult,
      TitleBasic.startyear,
      TitleBasic.endyear,
      TitleBasic.runtimeminutes,
      TitleBasic.genres
    FROM
      imdb.title_basics AS TitleBasic
    WHERE
      (
        SELECT
          tconst
        FROM
          imdb.title_principals AS titleBasicTitlePrincipal
        WHERE
          (
            titleBasicTitlePrincipal.nconst = 'nm1588970'
            AND titleBasicTitlePrincipal.tconst = TitleBasic.tconst
          )
        LIMIT
          1
      ) IS NOT NULL
    ORDER BY
      TitleBasic.startyear DESC
    LIMIT
      10
  ) AS TitleBasic
  INNER JOIN imdb.title_principals AS titleBasicTitlePrincipal ON TitleBasic.tconst = titleBasicTitlePrincipal.tconst
  AND titleBasicTitlePrincipal.nconst = 'nm1588970'
ORDER BY
  TitleBasic.startyear DESC;
Enter fullscreen mode Exit fullscreen mode

That’s a lot. We can see that Sequelize decided to use subqueries to correlate tables. That happens when we join tables and we don’t specify duplicating: false. Sequelize thinks that it will get duplicates and wants to avoid that.

Query executes in nearly 130 seconds and reads 70 million rows. That’s a lot.

Image description

We can try improving it by adding the index:

CREATE INDEX IF NOT EXISTS title_principals_nconst_idx ON imdb.title_principals(nconst) INCLUDE (tconst);
Enter fullscreen mode Exit fullscreen mode

Now we get the following:

Image description

We can see the index has been used. The query is now faster, but it’s still a terrible performance. Let’s remove the index and see if we can do better. Let’s add the duplicating flag:

return titleBasic
        .findAll({
          include: [{
            model: titlePrincipal,
            required: true,
            duplicating: false,
            as: 'titleBasicTitlePrincipal',
            where: {
              'nconst': nconst
            },
          }],
          order: [
            ['startyear', 'DESC']
          ],
          limit: 10
        });
Enter fullscreen mode Exit fullscreen mode

This is the query we’re getting now:

SELECT
  TitleBasic.tconst,
  TitleBasic.titletype,
  TitleBasic.primarytitle,
  TitleBasic.originaltitle,
  TitleBasic.isadult,
  TitleBasic.startyear,
  TitleBasic.endyear,
  TitleBasic.runtimeminutes,
  TitleBasic.genres,
  titleBasicTitlePrincipal.tconst AS titleBasicTitlePrincipal.tconst,
  titleBasicTitlePrincipal.ordering AS titleBasicTitlePrincipal.ordering,
  titleBasicTitlePrincipal.nconst AS titleBasicTitlePrincipal.nconst,
  titleBasicTitlePrincipal.category AS titleBasicTitlePrincipal.category,
  titleBasicTitlePrincipal.job AS titleBasicTitlePrincipal.job,
  titleBasicTitlePrincipal.characters AS titleBasicTitlePrincipal.characters
FROM
  imdb.title_basics AS TitleBasic
  INNER JOIN imdb.title_principals AS titleBasicTitlePrincipal ON TitleBasic.tconst = titleBasicTitlePrincipal.tconst
  AND titleBasicTitlePrincipal.nconst = 'nm1588970'
ORDER BY
  TitleBasic.startyear DESC
LIMIT
  10;
Enter fullscreen mode Exit fullscreen mode

It looks much more decent. Just a regular join. Let’s see the insights:

Image description

It’s still reading a lot of data, but runs much faster (in 60 seconds). If we add the index, then we get the following performance:

Image description

This is great. We can also send the raw query with the following:

return sequelize.query(`CREATE INDEX IF NOT EXISTS title_principals_nconst_idx ON imdb.title_principals(nconst) INCLUDE (tconst)`).then(() =>
        sequelize.query(`
          SELECT TitleBasic.*
          FROM imdb.title_basics AS TitleBasic
          JOIN imdb.title_principals AS TitlePrincipals ON TitlePrincipals.tconst = TitleBasic.tconst
          WHERE TitlePrincipals.nconst = :nconst
          ORDER BY TitleBasic.startyear DESC
          LIMIT 10
        `, {
          model: titleBasic,
          mapToModel: true,
          replacements: {
            nconst: nconst
          },
        })
      );
Enter fullscreen mode Exit fullscreen mode

For a given actor, find their ten most highly rated films

Let’s now find the best movies for an actor. This is the code we can use:

return titleBasic
        .findAll({
          include: [
            {
              model: titleRating,
              required: true,
              duplicating: false,
            },
            {
              model: titlePrincipal,
              required: true,
              duplicating: false,
              as: 'titleBasicTitlePrincipal',
              where: {
                'nconst': nconst
              },
            },
          ],
          order: [
            [ titleRating, 'averagerating', 'DESC'],
          ],
          limit: 10
        });
    }
Enter fullscreen mode Exit fullscreen mode

The code generates the following query:

SELECT
  TitleBasic.tconst,
  TitleBasic.titletype,
  TitleBasic.primarytitle,
  TitleBasic.originaltitle,
  TitleBasic.isadult,
  TitleBasic.startyear,
  TitleBasic.endyear,
  TitleBasic.runtimeminutes,
  TitleBasic.genres,
  TitleRating.tconst AS TitleRating.tconst,
  TitleRating.averagerating AS TitleRating.averagerating,
  TitleRating.numvotes AS TitleRating.numvotes,
  titleBasicTitlePrincipal.tconst AS titleBasicTitlePrincipal.tconst,
  titleBasicTitlePrincipal.ordering AS titleBasicTitlePrincipal.ordering,
  titleBasicTitlePrincipal.nconst AS titleBasicTitlePrincipal.nconst,
  titleBasicTitlePrincipal.category AS titleBasicTitlePrincipal.category,
  titleBasicTitlePrincipal.job AS titleBasicTitlePrincipal.job,
  titleBasicTitlePrincipal.characters AS titleBasicTitlePrincipal.characters
FROM
  imdb.title_basics AS TitleBasic
  INNER JOIN imdb.title_ratings AS TitleRating ON TitleBasic.tconst = TitleRating.tconst
  INNER JOIN imdb.title_principals AS titleBasicTitlePrincipal ON TitleBasic.tconst = titleBasicTitlePrincipal.tconst
  AND titleBasicTitlePrincipal.nconst = 'nm1588970'
ORDER BY
  TitleRating.averagerating DESC
LIMIT
  10;
Enter fullscreen mode Exit fullscreen mode

The query runs in 35 seconds, and these are the insights:

Image description

Let’s now add the index:

CREATE INDEX IF NOT EXISTS title_principals_nconst_idx ON imdb.title_principals(nconst) INCLUDE (tconst);
Enter fullscreen mode Exit fullscreen mode

And now the query runs immediately:

Image description

Find the ten top rated films with some number of votes

Here is the application code for getting top movies:

return titleBasic
        .findAll({
          include: [
            {
              model: titleRating,
              required: true,
              duplicating: false,
              where: {
                'numvotes': { [Op.gte]: numvotes }
              }
            },
          ],
          order: [
            [ titleRating, 'averagerating', 'DESC'],
          ]
        });
Enter fullscreen mode Exit fullscreen mode

The code generates the following query:

SELECT
  TitleBasic.tconst,
  TitleBasic.titletype,
  TitleBasic.primarytitle,
  TitleBasic.originaltitle,
  TitleBasic.isadult,
  TitleBasic.startyear,
  TitleBasic.endyear,
  TitleBasic.runtimeminutes,
  TitleBasic.genres,
  TitleRating.tconst AS TitleRating.tconst,
  TitleRating.averagerating AS TitleRating.averagerating,
  TitleRating.numvotes AS TitleRating.numvotes
FROM
  imdb.title_basics AS TitleBasic
  INNER JOIN imdb.title_ratings AS TitleRating ON TitleBasic.tconst = TitleRating.tconst
  AND TitleRating.numvotes >= '10000'
ORDER BY
  TitleRating.averagerating DESC;
Enter fullscreen mode Exit fullscreen mode

We get the following insights:

Image description

Similarly to our first part, we can see table scans instead of indexes. Let’s add the index that Metis suggests:

CREATE INDEX IDX_imdb_title_ratings_7a4c4d1e1b2 ON imdb.title_ratings (numvotes);
Enter fullscreen mode Exit fullscreen mode

This is what we get now:

Image description

We can see that we return way too many rows. Apart from that, all is good.

Summary

That’s it for now. In the next part, we are going to see more queries with Sequelize. They will be much more complex and sophisticated, however, Metis will give us enough help to optimize them. Stay tuned!

💖 💪 🙅 🚩
adammetis
Adam Furmanek

Posted on July 12, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related