A journey to optimize a SpringData SQL query with N+1s

loiklak

Loïc Chau

Posted on November 20, 2024

A journey to optimize a SpringData SQL query with N+1s

Some context

While browsing my app in pre-production, I noticed that the homepage was starting to take a long time to load...

Investigating in my APM

I looked in Kibana's APM (Application Performance Management) and...

Two API endpoints average response time

A route with 13s and another with 6.5s average latency?????? 🤯
That's much more than I'd want...

The N+1 problem

It quickly becomes clear that we have a pretty severe N+1 problem.

Endpoint response breakdown including 849 SELECT clauses

To help you better picture the DB schema :

A project table in a 1-1 relation with the Relation_1 table and in the 1-n relation with the Relation_2 table

In addition to the expected SELECT query to retrieve a project line in DB, we execute 849 additional SELECT queries on Relation_1, which takes 3.6s out of the 4.2s total duration of the HTTP call 🫠

The worst part : I don't even need the Relation_1 information in this endpoint 🥲 so I'm just wasting ressources

We can't see it on the capture but there are another 200s SELECT to get data from Relation_2, so we have a double N+1 queries issue.

The EAGER toxic relation

The culprit in the code was the FetchType.EAGER on the Relation1 and Relation2 relationships (or so I thought at this point).

Misconception: A FetchType.EAGER helps avoid N+1 issues because it fetches everything at once, so JPA must be optimizing and performing JOINs.

Reframe: FetchType.EAGER ensures that database reads are transactional but not optimized. In fact, using FetchType.EAGER guarantees that you’re always in the worst-case scenario for N+1 issues 😱 → which was exactly my case.

After looking around in Baeldung I discovered Hibernate's FetchMode :

In general, FetchMode defines how Hibernate will fetch the data (by select, join or subselect). FetchType, on the other hand, defines whether Hibernate will load data eagerly or lazily.

The exact rules between these two are as follows:

  • if the code doesn’t set FetchMode, the default one is JOIN and FetchType works as defined
  • with FetchMode.SELECT or FetchMode.SUBSELECT set, FetchType also works as defined
  • with FetchMode.JOIN set, FetchType is ignored and a query is always eager

It seemed to me like Hibernate uses optimized parameters since it uses JOINs by default. I also thought that it would use FetchType.LAZY by default.

False assumption (but I didn’t know it at the time): Hibernate is well-designed, and by default, all joins have the best default settings: FetchMode.JOIN and FetchType.LAZY.

Reality: This is actually false (as we’ll see later).

❌ Removing the FetchType.EAGER and, if needed, replace it with FetchMode.JOIN (result: -0s).

I kept some FetchMode.JOIN annotations to optimize queries in cases where we’re certain to always fetch the tables together. (Reality: otherwise, my tests broke 🤪).

Under the hood, FetchMode.JOIN forces FetchType.EAGER, but it optimizes the query by performing a join as expected, so in theory we should have no more N+1 queries.

However, in practice, FetchMode.JOIN is essentially useless in my app... It’s ignored because we fetch data with SpringData 😩 I guess it is therefore essentially the same as having a FetchType.EAGER

Spring data internally use Criteria API for the findAll() query , so @Fetch(value = FetchMode.JOIN) will not have any effect.

StackOverflow thread for more info

Results

Even though FetchMode.JOIN doesn't work, at least the relations where I removed the fetchType=EAGER should now be LAZY and my N+1 problem should improve... Right ??? But nothing has changed 😶

Anakin/Padme meme about FetchMode.JOIN not resolving my N+1 issue

Why ????

Actually, the @OneToOne relationship defaults to FetchType.EAGER 💀 as you can see in the source code of the annotation.

FetchType.EAGER annotation source code

So, we should explicitly set FetchType.LAZY on the relationship…

But I think a much more interesting question is: why, in the name of Jesus of Nazareth, doesn’t Spring Data generate a join for us??? We’ll still try both approaches to see what happens.

Setting FetchType.LAZY on the @OneToOne ❌

IntelliJ warning that non owning side of a one-to-one relationship will always be  raw `FetchType.EAGER` endraw  no matter what

The problem occurs when fetching projects, so I tried setting FetchType.LAZY on ProjectModel Relation_1 relationship. But… Project isn’t the owner of the relationship 💀 which means the fetch is still EAGER!!! AAAAAAAAAAAH 🗿 (thanks IntelliJ for pointing a out this issue with a warning by the way)

Unless you are using Bytecode Enhancement, you cannot fetch lazily the parent-side @OneToOne association.

StackOverflow thread - How can I make a JPA OneToOne relation lazy

I hope I won't need to make Bytecode enhancement because it seems a bit overkill for my use case 🙃


💡 Learning moment: Defining a @OneToOne relationship on both sides when it’s not necessary is a bad idea because the non-owning side will always default to FetchType.EAGER, leading to a ton of N+1 issues 💀


Gladly, in my case, a solution to the N+1 problem is to remove the @OneToOne relationship annotation on Relation_1 in the Project model since we never query the project alongside Relation_1.

Or, alternatively, I could ensure the relation is properly fetched with a JOIN query — though this is less optimal because it will unnecessarily load the memory with unneeded Relation_1 information when fetching projects.

Kindly ask Spring Data to perform JOINs 🤗 (😠)

As we saw earlier, FetchMode.JOIN is useless with Spring Data because it internally uses the Criteria API, which doesn’t consider FetchMode (thank you Criteria API).

The solution is to use Entity Graphs 🥹🥹 the savior.

Entity Graph allows us to specify to Spring Data which joins it should make.

Example :

On your model :

@Entity
// Define an EntityGraph named "Project.forDomain"
@NamedEntityGraph(name = "ProjectModel.forDomain",
        attributeNodes = {
          // This entity graph specifies to eagerly fetch (with JOINs)
          // the following property of the model
          @NamedAttributeNode("relation2")
        })
public class ProjectModel {

    @OneToOne(mappedBy = "project", fetch = FetchType.LAZY)
    public Relation2 relation2;
}
Enter fullscreen mode Exit fullscreen mode

Then in the SpringData repository

@Repository
public interface ProjectJpaRepository {
    // This method will use the entity graph "ProjectModel.forDomain"
    // to know how to fetch the SQL data
    @EntityGraph(value="ProjectModel.forDomain", type = EntityGraph.EntityGraphType.FETCH)
    List<ProjectModel> findBySomeProperty(String someProperty);
}
Enter fullscreen mode Exit fullscreen mode

Little bit of thinking 🤔

Should we use type = EntityGraphType.LOAD or type = EntityGraphType.FETCH?
LOAD = the properties of the graph are EAGER JOIN, while others follow their default fetch specs.
FETCH = the properties of the graph are EAGER JOIN, while others are LAZY loaded.

I’ll prefer FETCH for performance by default and specify EAGER JOINs if I need data to be fetched together.

StackOverflow thread - What is the difference between FETCH and LOAD for Entity graph of JPA?


Well, after all these twists and turns... the entity graph works! 🥹🥹🥹🥹🥹🥹🥹🥹🥹🥹

You didn’t think it was over, did you? 😊

Removing the relation1 property from projectModel means we no longer benefit from orphanRemoval 🥲 so some of my tests broke.

In the end, it was pretty quick to fix though.

Conclusion

Removing the reference to relation1 in ProjectModel helped eliminate the N+1 issues on one of my routes, N+1 which were solely caused by the EAGER @OneToOne on the non-owning side.

And the entity graph helped eliminate the N+1 issues on the other route 😍, which was plagued by the relationship with relation2 which wasn't done with a JOIN.


💡 Thinking back on how to debug SQL performance issues

To iterate: write SQL tests!!!!! It helps you iterate so much faster than by manually testing your app 😱
There is an initial cost to setup a test reproducing your issue but I swear it is worth it.

By enabling show-sql: true in the application.yml for the test environment, you can see the SQL queries that are being executed.


I hope you learned something from my optimization journey and had some fun as well 😄
If you have any questions ask in the comments I'll respond to the best of my capacities 😉

💖 💪 🙅 🚩
loiklak
Loïc Chau

Posted on November 20, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related