6 Ways You're Using ORMs Wrong and How To Fix
Nir Tayeb
Posted on February 11, 2024
We love ORMs because they simplify and streamline the working process with relational databases.
ORMs provide a higher level of abstraction, allowing us to work with objects and classes instead of raw SQL queries. They automate everyday database tasks such as CRUD operations, reducing the boilerplate code we need to write. ORMs handle database connections and transactions, making managing and scaling our applications easier. With ORMs, it's possible to write database-agnostic code, quickly switch between different database systems, and focus more on application logic.
But with all the benefits and abstractions, it is easy to write poorly performant code.
For the examples, I'll use the scheme from the Prisma ORM documentation, which describes a blogging platform with users, profile details, posts, and categories.
datasource db {
provider = "postgresql"
url = env("DATABASE_URL")
}
generator client {
provider = "prisma-client-js"
}
model User {
id Int @id @default(autoincrement())
email String @unique
name String?
role Role @default(USER)
posts Post[]
profile Profile?
}
model Profile {
id Int @id @default(autoincrement())
bio String
user User @relation(fields: [userId], references: [id])
userId Int @unique
}
model Post {
id Int @id @default(autoincrement())
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
title String
published Boolean @default(false)
author User @relation(fields: [authorId], references: [id])
authorId Int
categories Category[]
}
model Category {
id Int @id @default(autoincrement())
name String
posts Post[]
}
enum Role {
USER
ADMIN
}
The n+1 problem
Fetching of data without using eager loading
Assuming we want to list all user posts with a User role. A naive approach to writing the code is:
async getUserPosts() {
const users = await prisma.user.findMany({where: {role: prisma.Role.User}});
return users.flatMap(user => user.posts);
}
Behind the scenes, what is going to happen:
- Query the database for the user's table for all the users with role admin:
SELECT * FROM users WHERE role='User'
- Then, For each record, query the database (again) for the user posts:
SELECT * FROM posts WHERE userid=<X>
You won't feel the performance issue when developing the application with a few test users. But once you get to hundreds or thousands of users in the production environment, this function will take "forever" from a user experience perspective. Every round trip to your DB takes at least 10ms, not including the query that needs to run and the data transfer between the databases and your server.
Using eager-loading, you will reach the database only once!
async getUserPosts() {
const users = await prisma.user.findMany({
where: {role: prisma.Role.User},
include: {posts: true }
});
return users.flatMap(user => user.posts);
}
By using include here, the ORM (prisma in my example) uses the SQL Join statement when constructing the SQL to query the data.
SELECT *
FROM users
INNER JOIN posts ON users.id = posts.userid
WHERE role='User'
Another variation of the n+1 problem
Assuming we want to list all posts with their writer's short bio, a naive approach will look like this:
const posts = await prisma.posts.findMany();
const postsWithBio = posts.map((post: prisma.Post) => {
return {
title: post.title,
publishedAt: post.publishedAt,
authorBio: post.author.profile?.bio
});
Again, the issue is that the code will make a round trip to the database for each post to query the author and the profile. The solution in this specific case is to use include twice, but if we have a more complex schema that uses more relations between the entities. We can implement another solution.
First, map all the unique author IDs, then fetch only the profiles (or bios) of these author IDs and map them
async function getPostWithBio(){
const posts = await prisma.posts.findMany();
const authorIds = posts.map((post: prisma.Post) => post.authorid);
const profiles = await prisma.profiles.findMany({authorId: {in: authorIds}});
const authorToBio = new Map(profiles.map(p => [p.userId, p.bio]));
const postsWithBio = posts.map((post: prisma.Post) => {
return {
title: post.title,
publishedAt: post.publishedAt,
authorBio: authorToBio.get(post.authorId)
});
}
In this way, only two queries are going to the database.
SELECT * FROM posts;
SELECT * FROM profiles where userid in (1,2,3,4,...);
Remember that this is a straightforward use case; this issue is widespread in more complex systems and looks slightly different. Your code will iterate over one entity, then go into other classes and functions, which will fetch more data from the DB, creating the N+1 problem without you notice.
How to identify the N+1
The best way to identify the N+1 issue in your system is by monitoring your app in three different ways:
- Turn on your debug/info log and configure your ORM to log the SQL queries it generates.
- Watch the database query log and look for frequent and similar queries fetching by one ID.
- Using an Application Performance Monitoring (APM) system such as NewRelic, Sentry, and Azure Application Insight - they visualize all the calls to the DB on each request/transaction and show you how long it took and how long each request spent communicating with the DB.
Querying without using indexes
No defining Index
Assuming we need to search for posts by title, a naive implementation will look like this:
const posts = await prisma.posts.findMany({title: query});
If we look in the schema, we won't see an index definition on the title field. We won't see any index explicitly defined in our schema (except primary keys/foreign keys relations/unique). Why? Because it's not intuitive for object-oriented/functional programmers to think about data indexing.
Running the code above will result in a full table scan to look up the query. As the name suggests, a full table scan means an O(N) lookup for the data. Running the query will take a lot of time and resources if we have thousands of posts and more properties on each post.
If we set an index (of the default "b-tree" type) on the field, the lookup time will usually decrease to O(Log-N).
model Post {
id Int @id @default(autoincrement())
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
title String
published Boolean @default(false)
author User @relation(fields: [authorId], references: [id])
authorId Int
categories Category[]
@@index([title])
}
💡 Suppose you don't use the query frequently. Instead of defining an index on the title field, you can add conditions to the query using other indexes. Then, the search will be on a subset of the data instead of having a full table scan.
💡 Remember that the cost of indexing the data is a lower write performance. So, depending on your use case, decide which index is necessary and which is nice if writing performance matters to you.
The query cannot use the index (wrong index type)
Assuming we want to query posts by a string found in the title.
const posts = await prisma.posts.findMany({title: {contains:"query"}});
This code results in a query including the "LIKE" operator:
SELECT * FROM posts WHERE title LIKE '%query%';
The default index algorithm is B-Tree, and leading '%' cannot be queried using the index.
The solution is to define another type of index to query the data efficiently; in this case, it's GIN or GiST (on PostgreSQL)
@@index([title], type: GIN)
For further reading, I suggest reading this fantastic blog post explaining how trigram-based operations and indexes improve query performance by several orders of magnitude.
Using model objects as output
Assuming we need to implement an endpoint to fetch all users, a naive approach will look like this.
@Controller('users')
export class UsersController{
@Get()
findAll(): prisma.User {
return await prisma.users.findMany({});
}
}
When nest.js and other frameworks evaluate objects, they transform the object entirely to JSON, iterating all over the properties and sub-properties. Using the model as the output will result in lazy loading of each user's relations, meaning more queries to the DB and more unnecessary data returned with the requests. It can also lead to a security risk of exposing private or confidential information.
Instead, you should clearly define the interface to use. With typescript, it's possible to use Pick to define a partial of another type to pick only a subset of the properties or Omit to exclude some properties.
Query unnecessary data
Assuming we want to query all posts and print only their dates and titles:
const posts = await prisma.posts.findMany({});
for (const post of posts) {
console.log(post.title, post.publishedAt);
}
This code ends up with this SQL query:
SELECT * FROM posts;
It looks simple, but if we have even 1000 posts and their content each is 10KB of words, we transfer at least 10MB of data without even using it. Instead, we should select only the fields we need.
const posts = await prisma.posts.findMany({select: ['title', 'publishedAt']});
for (const post of posts) {
console.log(post.title, post.publishedAt);
}
Which will end up with a query like:
SELECT title, publishedAt FROM posts;
In summary, using ORM is excellent for productivity but not always suitable for performance. Understanding what is happening behind the scenes is crucial to ensure optimal code performance.
If you enjoyed the article, please forward and share it with your teammates/acquaintances and help them avoid these mistakes.
This post was originally published in my newsletter "Percentile 99th".
My subscribers get all the information first; in the future, exclusive content will be shared only over the newsletter.
Posted on February 11, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.