Tackling Performance Issues in Ecto Applications

msramos

Marcos Ramos

Posted on May 30, 2023

Tackling Performance Issues in Ecto Applications

Ecto can be considered the standard database wrapper and query generator for Elixir, enabling developers to interact with databases efficiently.

However, inefficient queries or misconfigurations still can (and will) happen. Addressing these issues is crucial for a smooth user experience.

In this article, we'll explore three common performance issues in Ecto applications:

  • N+1 query problem: You get excessive redundant queries from retrieving related data.
  • Inefficient query execution: Poorly designed queries will strain a database.
  • Connection pooling and concurrency issues: Bottlenecks and slow response times are caused by configuration or concurrency problems.

We'll discuss how to detect each issue, as well as their common causes and solutions, so you can optimize your Elixir applications for peak performance.

Let's start!

The N+1 Query Problem

The N+1 query problem occurs when an application loads a parent record and its associated child records in separate queries.

This leads to one query for the parent record (1) and one query for each child record (N), resulting in N+1 queries in total. This issue can cause a significant performance hit due to an excessive number of redundant queries.

Detecting the Problem in Elixir

Suppose we have a User schema on a many-to-many relationship with a Role schema. We'll call the schema that associates the two RoleAssignment:

# my_app/lib/my_app/user.ex
defmodule MyApp.User do
  use Ecto.Schema

  schema "users" do
    field :name, :string

    has_many :role_assignments, MyApp.RoleAssignment, foreign_key: :user_id

    timestamps()
  end
end

# my_app/lib/my_app/role.ex
defmodule MyApp.Role do
  use Ecto.Schema
  import Ecto.Changeset

  schema "roles" do
    field :name, :string

    has_many :role_assignments, MyApp.RoleAssignment, foreign_key: :role_id

    timestamps()
  end
end

# my_app/lib/my_app/role_assignment.ex
defmodule MyApp.RoleAssignment do
  use Ecto.Schema

  schema "role_assignments" do
    belongs_to :role, MyApp.Role, foreign_key: :role_id
    belongs_to :user, MyApp.User, foreign_key: :user_id
    timestamps()
  end
end
Enter fullscreen mode Exit fullscreen mode

Imagine that we want to show all the users and their roles on an admin page:

users = Repo.all(User)

users_with_roles =
  Enum.map(users, fn user ->
    role_assignments = Repo.all(from ra in MyApp.RoleAssignment, where: ra.user_id == ^user.id)
    roles = Enum.map(role_assignments, fn ra -> Repo.get!(MyApp.Role, ra.role_id) end)

    %{user | roles: roles}
  end)
Enter fullscreen mode Exit fullscreen mode

As you can see here, we generate N queries to load the associated roles for every user.

To detect the N+1 query problem, you can use tools like Telemetry. Telemetry monitors query counts and identifies queries that are executed multiple times with slight variations.

For example, you wrap the query into a span and then attach a handler to the events to detect it:

# my_app/lib/user_context.ex
defmodule MyApp.UserContext do
  alias MyApp.User
  alias MyApp.Role
  alias MyApp.RoleAssignment

  alias MyApp.Repo

  def get_all_users do
    event = [:my_app, :query]                    # the name of the event
    start_metadata = %{context: "get_all_users"} # metadata to be sent on start

    :telemetry.span(event, start_metadata, fn ->
      users = Repo.all(User)
      result = Enum.map(users, fn user ->
        %{user | roles: get_roles(user)}
      end)

      stop_metadata = %{}                        # metadata to be sent on stop

      {result, stop_metadata}
    end)
  end

  def get_roles_for_user(user) do
    event = [:my_app, :query]                    # the name of the event
    start_metadata = %{                          # metadata to be sent on start
      context: "get_roles_for_user",
      user_id: user.id
    }

    :telemetry.span(event, start_metadata, fn ->
      role_assignments = Repo.all(from ra in MyApp.RoleAssignment, where: ra.user_id == ^user.id)
      roles = Enum.map(role_assignments, fn ra -> Repo.get!(MyApp.Role, ra.role_id) end)

      stop_metadata = %{}                        # metadata to be sent on stop

      {roles, stop_metadata}
    end)
  end
end
Enter fullscreen mode Exit fullscreen mode

We receive two events for every function call: one when the span starts and the other when the span has finished.

With the spans in place, we can now attach a handler to listen for any call to them:

defmodule MyApp.Telemetry do
  # ...
  def handle_user_context_spans([:my_app, :user_repo, start_or_stop], _measurements, _metadata, _config) do

    case start_or_stop do
      :start ->
        # handle span start

      :stop ->
        # Record that a query was executed within the context
    end
  end
  #...
end
Enter fullscreen mode Exit fullscreen mode

New Phoenix applications ship with a telemetry supervisor under <app_web>/telemetry.ex. We can add the handler to its init/1 function:

# my_app/lib/my_app_web/telemetry.ex
defmodule MyAppWeb.Telemetry do
  # ...
  def init(_args) do
    children = [
      # ...
    ]

    :telemetry.attach_many("user_context_handler",
      [
        [:my_app, :user_repo, :start],
        [:my_app, :user_repo, :stop]
      ],
      &MyApp.Telemetry.handle_user_context_spans/4, [])

    Supervisor.init(children, strategy: :one_for_one)
  end
  # ...
end
Enter fullscreen mode Exit fullscreen mode

Here are a couple of strategies to detect N+1:

  • Count the total number of queries within a span context to analyze anomalies or spikes.
  • Create a metric, publish an event from the span handler, and, from there, plot data into a dashboard.

It is not an easy problem to spot, but when your application traffic starts to increase, you'll see some symptoms. Here are some additional things you can look for that may indicate you're experiencing an N+1 problem:

  • High CPU usage on your database systems
  • All the pages of your web app are fast except for a few
  • When you open them, lots of the same SQL is being executed

In general, the more instrumentation you have in place, the easier it is to detect N+1 queries.

For more information on N+1 queries and how to use AppSignal to detect them, read our post Performance and N+1 Queries: Explained, Spotted, and Solved.

Using Ecto's Preload

Ecto provides the preload/3 function to load associated records in a single query, avoiding the N+1 query problem.

Here's an example of how to use preload/3. First, add the proper relationship to the model:

# my_app/lib/my_app/user.ex
defmodule MyApp.User do
  # ...

  schema "users" do
    # ...
    many_to_many :roles, MyApp.Role, join_through: MyApp.RoleAssignment
    # ...
  end
end
Enter fullscreen mode Exit fullscreen mode

Now, to fetch all users along with their roles, you can use preload/3 like this:

import Ecto.Query

users = MyApp.Repo.all(
  from(u in MyApp.User,
    preload: [:roles]
  )
)
Enter fullscreen mode Exit fullscreen mode

This will load the user in one query, and roles in another query, regardless of the number of posts a user might have. Great — we go from N+1 to 2!

As an example, an N+1 query with 200 users and 200 roles takes 20 seconds to load on my computer. Using the preload, this number is reduced to 600 milliseconds!

But there is room for even more optimization.

Preloading with Joins

Preloading with joins has the advantage of generating only one query to load all associated records from a database.

In addition, you can also filter or sort the associated records! Just use join/5 along with preload/3.

Following the same example from the last section:

import Ecto.Query

query = from u in User,
  left_join: ra in assoc(u, :role_assignments),
  left_join: r in assoc(ra, :role),
  preload: [role_assignments: ra, roles: r],
  select: u

Repo.all(query)
Enter fullscreen mode Exit fullscreen mode

This fetches users and their assigned roles using a single query, avoiding the N+1 query problem. On my computer, this takes about 100 milliseconds to run.

By leveraging Ecto's preload/3 function and combining it with join/5 when necessary, you can efficiently load associated records and eliminate the N+1 query problem in your Ecto-based applications.

Inefficient Queries in Ecto

Inefficient query execution can result in slow database performance, as poorly designed queries may place unnecessary strain on the database. Now we'll learn how to detect and fix them.

Example 1: Missing Indexes

One of the most common causes of inefficient queries is a lack of indexes.

Suppose we have a Post schema like this one:

# my_app/lib/post.ex
defmodule MyApp.Role do
  use Ecto.Schema
  import Ecto.Changeset

  schema "posts" do
    field :title, :string
    field :body, :string

    field :published_at, Date

    timestamps()
  end
end
Enter fullscreen mode Exit fullscreen mode

Let's fetch all published posts ordered by their publication date:

import Ecto.Query

published_posts = MyApp.Repo.all(
  from(p in MyApp.Post,
    where: is_nil(p.published_at) == false,
    order_by: [desc: p.published_at]
  )
)
Enter fullscreen mode Exit fullscreen mode

If there is no index on the published_at field, this query may become slow when the number of posts in the database grows. The database system will perform a full scan of the table for every query.

To fix this issue, you can add an index to the relevant column using a migration:

defmodule MyApp.Repo.Migrations.AddIndexOnPublishedAt do
  use Ecto.Migration

  def change do
    create index(:posts, [:published_at])
  end
end
Enter fullscreen mode Exit fullscreen mode

There's really no easy way to detect this from the application — it will only start to be noticeable with queries on large tables. From the point of view of an application, it's only a slow query.

In the next section, we'll learn how to monitor the query execution time with telemetry to detect such problems.

Detecting Bad Queries with PostgreSQL and Telemetry

You can use tools like EXPLAIN ANALYZE in PostgreSQL (or similar features in other databases) to detect inefficient queries by analyzing the execution plan and identifying possible bottlenecks.

Additionally, you can use telemetry to monitor query execution times and set up alerts if a query takes too long to execute.

For example, you might add a handler to alert you if a query takes longer than 5 seconds:

# my_app/lib/my_app/telemetry.ex
defmodule MyApp.Telemetry do
  # ...
  def handle_event([:my_app, :repo, :query], measurements, metadata, _config) do
    query_time_ms = measurements[:query_time] / (1_000 * 1_000)
    if query_time_ms > 5_000 do
      # Send an alert or log a warning
    end
  end
  # ...
end
Enter fullscreen mode Exit fullscreen mode

Now we can attach the handler during the telemetry supervisor startup, just like we did for the span handlers:

# my_app/lib/my_app_web/telemetry.ex
defmodule MyAppWeb.Telemetry do
  # ...
  def init(_args) do
    # ...

    :telemetry.attach("query-time-handler", [:my_app, :repo, :query], &MyApp.Telemetry.handle_event/4, [])

    Supervisor.init(children, strategy: :one_for_one)
  end
  # ...
end
Enter fullscreen mode Exit fullscreen mode

Query timeouts can also serve as a bad smell, indicating that inefficient query execution is occurring.

Ecto Connection Pooling and Concurrency Issues

Connection pooling problems can occur when an application tries to open more connections to a database than the pool allows. This can lead to a bottleneck, as processes are left waiting in a queue for an available connection.

Once again, telemetry can help us identify this problem. Ecto already ships with Telemetry support and emits several events that we can listen to and react to. The documentation has data and metadata that Ecto emits for all queries, including queue_time.

We can modify the previous handler example and also monitor long query waiting times in the connection pool queue:

# my_app/lib/my_app/telemetry.ex

  # ...
  def handle_event([:my_app, :repo, :query], measurements, metadata, _config) do
    queue_time_ms = measurements[:queue_time]

    if queue_time_ms > 5_000 do
      # The query waited more than 5s for a connection to be available
    end
  end
Enter fullscreen mode Exit fullscreen mode

Of course, you can always increase the number of connections in an application's configuration:

# config/runtime.exs or config/dev.exs
config :hatch, MyApp.Repo,
  # ...
  pool_size: String.to_integer(System.get_env("POOL_SIZE", "10"))
Enter fullscreen mode Exit fullscreen mode

With the config above, you can control the pool size using the POOL_SIZE env var or the default 10 connections size.

PostgreSQL Deadlocks

Locks are a fundamental mechanism that databases use to ensure data integrity. Still, they can struggle when there's a high volume of updates from different sources in the same timeframe.

A deadlock occurs when two or more processes wait for each other to release a resource, causing all processes to be stuck indefinitely.

How Deadlocks Occur

Deadlocks can happen when two or more processes try to acquire locks on multiple resources in an inconsistent order.

Consider the following schemas:

# my_app/lib/my_app/post.ex
defmodule MyApp.Post do
  use Ecto.Schema

  schema "posts" do
    field :reaction_count, :integer, default: 0
    has_many :reactions, MyApp.Reaction
  end
end

# my_app/lib/my_app/reaction.ex
defmodule MyApp.Reaction do
  use Ecto.Schema

  schema "reactions" do
    field :type, :string
    belongs_to :post, MyApp.Post
  end
end
Enter fullscreen mode Exit fullscreen mode

Now imagine the following scenario using post and reaction schemas:

  • Process A acquires a lock on Post X.
  • Process B acquires a lock on Post Y.
  • Process A tries to acquire a lock on Post Y (but is blocked because Process B holds the lock).
  • Process B tries to acquire a lock on Post X (but is blocked because Process A holds the lock).

In this case, both processes wait for each other to release the locks, resulting in a deadlock.

If two processes try to add reactions to different posts and update reaction counts at the same time, we could potentially run into a deadlock:

# Process A
task_a = Task.async(fn ->
  Repo.transaction(fn ->
    post_x = Repo.lock!(from(p in Post, where: p.id == ^post_x_id))
    Repo.insert!(%Reaction{type: "like", post_id: post_x.id})
    Repo.update!(Post.changeset(post_x, %{reaction_count: post_x.reaction_count + 1}))
  end)
end)

# Process B
task_b = Task.async(fn ->
  Repo.transaction(fn ->
    post_y = Repo.lock!(from(p in Post, where: p.id == ^post_y_id))
    Repo.insert!(%Reaction{type: "like", post_id: post_y.id})
    Repo.update!(Post.changeset(post_y, %{reaction_count: post_y.reaction_count + 1}))
  end)
end)

Task.await(task_a)
Task.await(task_b)
Enter fullscreen mode Exit fullscreen mode

Most database systems provide mechanisms to detect and automatically resolve deadlocks by rolling back one of the transactions involved.

To prevent deadlocks, you can:

  • Acquire locks in a consistent order across all processes. For example, you can enforce ordering by post ID:
Repo.transaction(fn ->
  post = Repo.lock!(from(p in Post, where: p.id == ^post_id, order_by: :id))
  Repo.insert!(%Reaction{type: "like", post_id: post.id})
  Repo.update!(Post.changeset(post, %{reaction_count: post.reaction_count + 1}))
end)
Enter fullscreen mode Exit fullscreen mode
  • Use timeouts when acquiring locks to prevent indefinite waits. In Ecto, you can use the :timeout option with the Repo.transaction/2 function:
Repo.transaction(fn ->
  post = Repo.lock!(from(p in Post, where: p.id == ^post_id))
  Repo.insert!(%Reaction{type: "like", post_id: post.id})
  Repo.update!(Post.changeset(post, %{reaction_count: post.reaction_count + 1}))
end, timeout: 5_000)
Enter fullscreen mode Exit fullscreen mode

By setting a timeout, a transaction will be rolled back if it cannot acquire the necessary locks within the specified time (preventing deadlocks from causing an application to hang indefinitely).

Final Thoughts

From the application point of view, there is only so much that we can actually monitor, since only a handful of entities can evaluate application context like query execution time, queue time, memory consumption, etc.

The best way to know what is happening to your whole stack is to instrument your application with tools like Telemetry or OpenTelemetry, then connect it to your favorite monitoring tool, like AppSignal!

Read more about instrumenting AppSignal for Ecto.

Wrap Up

In this article, we explored common performance issues in Elixir applications using Ecto and provided insights on detecting and addressing these problems.

We started with the N+1 query problem, learning to identify and solve it using Ecto's preload functionality. Then we investigated inefficient query execution, discussing how to optimize queries and use Telemetry for monitoring.

Lastly, we covered connection pooling and concurrency issues, emphasizing the importance of proper configuration, monitoring, and techniques for avoiding deadlocks.

Happy coding!

P.S. If you'd like to read Elixir Alchemy posts as soon as they get off the press, subscribe to our Elixir Alchemy newsletter and never miss a single post!

💖 💪 🙅 🚩
msramos
Marcos Ramos

Posted on May 30, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related