Scaling Your Phoenix App in Elixir with FLAME

diwakarsapan

Sapan Diwakar

Posted on September 18, 2024

Scaling Your Phoenix App in Elixir with FLAME

When you build an app, you'll often find that certain tasks do not require user interaction and are better performed in the background. Elixir provides excellent primitives, such as Task.async, to offload these tasks from the main user pipeline. Additionally, libraries like Oban offer more control over background tasks when needed.

There's also FLAME, which the core Phoenix team is developing to offer a scalable solution for offloading intensive tasks to remote machines.

We will compare these methods and see how FLAME stands out.

But first, let's delve into how FLAME works and how it can help you scale your Phoenix application.

Understanding FLAME for Phoenix

FLAME stands for Fleeting Lambda Application for Modular Execution. It is similar to Task.async, allowing you to run any block of code, but with the added benefit of executing it on a separate node. Depending on your configuration, FLAME can automatically manage infrastructure, spawning or scaling down nodes as needed.

To illustrate how FLAME works, let's consider an example where we need to find the SHA-256 hash of a file stored on Amazon S3. Calculating the hash for a large file can prove slow due to file size or network latency. By offloading this task to a background worker with FLAME, we can improve the performance and responsiveness of our main application.

Here’s how you can do it:

FLAME.call(MyApp.BackgroundRunner, fn ->
  ExAws.S3.download_file(bucket, file_path, :memory)
  |> ExAws.stream!()
  |> Enum.reduce(:crypto.hash_init(:sha256), fn chunk, acc ->
    :crypto.hash_update(acc, chunk)
  end)
  |> :crypto.hash_final()
  |> Base.encode16()
end)
Enter fullscreen mode Exit fullscreen mode

This is similar to Task.async and Task.await, with the key difference being that it runs on a separate node. When the checksum is ready, it returns the result to the main node. We'll discuss configuration options and available functions later in the post.

So, how is this different from any background job processing framework (e.g., Oban)?

  1. Built-in Scaling: FLAME has built-in support for scaling. It can automatically spawn new nodes when new tasks come in and scale down to zero (configurable) when no tasks are in the queue.
  2. Minimal Boilerplate: Unlike other frameworks, FLAME does not require extensive boilerplate code. You simply wrap your task inside FLAME.call or FLAME.cast.
  3. Awaiting Results: In most job frameworks (including the free version of Oban), you cannot await the result of background tasks. FLAME, however, supports this feature, allowing you to wait for a task's completion and retrieve the result.

FLAME in Action

In the previous section, we saw how easy it was to run a piece of code on a separate node using FLAME. Let's pull back a little and see how to integrate FLAME inside an existing Phoenix app.

Install FLAME

Add the dependency in mix.exs:

{:flame, "~> 0.1.12"}
Enter fullscreen mode Exit fullscreen mode

Start a FLAME Pool

FLAME.Pool is the main GenServer provided by FLAME that manages the scaling of nodes inside your app. It also schedules tasks and delivers results to each task's respective nodes.

Add FLAME.Pool to your application's supervision tree by updating the application.ex:

defmodule MyApp.Application do
  use Application

  @half_hour_millis 1_800_000

  @impl true
  def start(_type, _args) do
    children =
      [
        # ...
        # The BackgroundRunner pool controlled by FLAME
        {FLAME.Pool, name: MyApp.BackgroundRunner, min: 0, max: 5, max_concurrency: 5, idle_shutdown_after: @half_hour_millis},
      ]
    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end
end
Enter fullscreen mode Exit fullscreen mode

The name option defines the pool name to be used in calls to FLAME later in the application. This allows you to define several different pools with different configurations depending on the tasks that you want to achieve.

For example, for nodes that handle hash computation, you might want different concurrency settings than the nodes that handle more complex tasks, like video encoding. Check out all the available options for the pool. The most important ones that you will commonly use are:

  • :min - The minimum number of nodes to start. You can configure this to zero to support pools that spawn a node only when there is a task, and then scale down when there are no pending tasks. This is especially useful in pre-production environments to save infrastructure costs.
  • :max - The maximum number of nodes at a time.
  • :max_concurrency - The maximum number of concurrent executions on a node.
  • :timeout - Default timeout for functions. This can also be set during individual calls (using FLAME.call with the timeout option).
  • :idle_shutdown_after - The number of seconds after which an idle node is shut down (only if there are more than the min number of nodes).

Configure FLAME Backend

FLAME Backend is the component responsible for starting up new nodes, connecting them to parent nodes, and running functions. By default, FLAME ships with the FlyBackend. If you run your application on Fly, configuring FLAME is very simple. Just update your config.exs or runtime.exs if you use Elixir releases:

config :flame, :backend, FLAME.FlyBackend

config :flame, FLAME.FlyBackend,
  token: System.fetch_env!("FLY_API_TOKEN"),
  cpu_kind: System.get_env("FLAME_CPU_KIND", "shared"),
  memory_mb: System.get_env("FLAME_MEMORY_MB", "1024") |> String.to_integer()
Enter fullscreen mode Exit fullscreen mode

To access the Fly API for spawning or destroying machines, you need a token from Fly. You can optionally configure the cpu_kind, memory, and cpus for the spawned nodes. Check out the full list of supported options in the FlyBackend docs.

There is also a FLAMEK8sBackend available if you are running on Kubernetes. If you are not using Kubernetes, other platforms have no out-of-the-box support. However, you can refer to the FLAME.FlyBackend code and create a similar backend for your platform.

Running Tasks with FLAME

FLAME provides two main functions for running tasks on other nodes:

  1. FLAME.call/1: This function calls a task on a remote node and waits for the result. It's useful when you need to run tasks in the background but still require a result. Example: A user requests a 10,000,000th Fibonacci number. You can run the computation on another machine to avoid blocking web server resources while the user waits for the result.

  2. FLAME.cast/1: This function calls a task on a remote node without waiting for a result. It's useful when a result isn't needed immediately and you don't want to block the user pipeline. Example: After a user uploads a file, you want to update its checksum in the records. This can be done in the background without making the user wait.

By default, the spawned processes on the remote node are linked to the parent process, preventing orphaned tasks on remote nodes if the parent process is killed or dies. However, for some background tasks (e.g., file checksum generation), it might be useful to continue running a task even if the parent process is terminated.

Both FLAME.call/2 and FLAME.cast/2 support a link option that can be set to false to achieve this.

Deploying FLAME

You don't need anything special to deploy FLAME as long as you have configured a backend with all the required properties. FLAME starts your full application supervision tree on the remote node. Some parts of the application might not be necessary on a worker node. To control this, you can use FLAME.Parent.get/0 to determine if an application is running on the main node or a child node.
Update application.ex to add children to your supervision tree only if they are on the main node (i.e., if FLAME.parent.get() is nil):

defmodule MyApp.Application do
  use Application

  @impl true
  def start(_type, _args) do
    flame_instance? = not is_nil(FLAME.Parent.get())
    children = [
      # ...
      {FLAME.Pool, name: MyApp.BackgroundRunner, min: 0, max: 5, max_concurrency: 5, idle_shutdown_after: @half_hour_millis},
      !flame_instance? && MyAppWeb.Endpoint
    ]

    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end
end
Enter fullscreen mode Exit fullscreen mode

Technical Deep Dive into FLAME

Internally, FLAME leverages Elixir's powerful Node.spawn_monitor/4 function for spawning processes on remote nodes. This, coupled with BEAM's closure behavior, forms the core of FLAME's implementation.

Let's delve into what occurs under the hood when you execute FLAME.cast or FLAME.call:

  1. The request is dispatched to a pool of runners.
  2. If a runner is readily available (based on defined constraints like min, max, etc.), it spawns and monitors the process on the remote node.
    • Upon successful execution, the result is returned or discarded depending on the operation used.
    • Additionally, it monitors the process according to the timeout configuration and terminates it if necessary.
    • If the remote node crashes (e.g., due to memory exhaustion), the parent process is notified or terminated as per the configuration.
  3. If a process cannot be accommodated on an existing remote node, FLAME tries to spawn a new node (using the configured backend) within the specified limits.
    • Once the node is spawned, it follows the same steps as mentioned in (2).
    • If spawning a new node fails (e.g., because the maximum limit is reached), the task is queued until resources become available.
  4. Regardless of specific user calls, FLAME autonomously manages idle nodes based on the configuration, shutting them down as needed using the configured backend.

Understanding Closures

Closures play a pivotal role in FLAME's operation, and understanding them is essential in BEAM programming. A closure captures its enclosing environment, including variables from the outer scope. When a closure is defined, it retains references to the variables it "closes over", thus preserving the state of its enclosing process. This behavior is crucial for maintaining consistency, even in scenarios where a process crashes and restarts.

However, when transmitting closures across different nodes in BEAM, the code on all nodes must match precisely. The Fly backend addresses this requirement well, as deployments on Fly rely on Docker images of the released code. This ensures uniformity between the parent node and the remote node, facilitating seamless execution of FLAME.

Leveraging Files as Processes

Things get more interesting when we explore the concept of treating files as processes. In Elixir, opening a file spawns a new process. Writing to a file is akin to sending messages to the process handling the file descriptor. This behavior extends to functions like File.stream! and all other file operations. Consequently, we can extend FLAME's capabilities to handle scenarios such as computing checksums of files stored on S3 or even supporting user-uploaded files directly on the web server.

Here's a code sample:

stream = File.stream!(file_path) # this runs on the web server
FLAME.call(MyApp.BackgroundRunner, fn ->
  # this runs on the remote machine
  stream
  |> Enum.reduce(:crypto.hash_init(:sha256), fn chunk, acc ->
    :crypto.hash_update(acc, chunk)
  end)
  |> :crypto.hash_final()
  |> Base.encode16()
end)
Enter fullscreen mode Exit fullscreen mode

In this scenario, the file stream is initiated on the source machine, while the checksum computation is offloaded to the remote worker.
This setup capitalizes on Elixir and BEAM's handling of closures and processes. However, it's worth noting that a more efficient approach would involve establishing a shared volume accessible from both nodes. A shared volume minimizes resource usage and enhances performance. This is in contrast to utilizing the stream from the main node, which still consumes resources to transmit data to the remote node.

Comparing FLAME With Other Approaches

Let's now see how FLAME stacks up against some alternative approaches.

Running Background Tasks on the Same Node

Using Task.async and similar functions, your tasks run on the same node as your web server. This approach works well for quick, lightweight tasks. However, during periods of heavy load, these background tasks can compete with your web server for resources, potentially slowing down your application's response time.

Using Oban for Elixir to Get More Control

Oban is a powerful library that allows you to manage background jobs with more granularity. It enables you to:

  • Define dedicated Oban Workers.
  • Configure job queues.
  • Set up a job storage backend.

Oban also supports running these tasks on dedicated nodes, which can help reduce the load on your main web server. However, setting up Oban involves significant boilerplate code. In its free version, it does not support automatic scaling of the physical nodes running the jobs.

Achieving Infinite Scale with Serverless Functions

For truly infinite scalability, you can use external serverless functions like AWS Lambda, Google Cloud Functions, or Azure Functions. These services allow you to run background tasks independently of your main application infrastructure. However, they come with their own set of challenges:

  • Significant boilerplate and context switching for developers.
  • Synchronization issues between third-party services and your application code.
  • Potential latency and cold start problems.

FLAME: A Summary

As we have seen, FLAME provides a robust way to handle complex background tasks at scale within your Phoenix app. It aims to simplify the setup process and reduce boilerplate code, offering an efficient and scalable solution. All of this comes without the need to context switch — you can call FLAME code from within your app.

By using FLAME, you can:

  • Easily offload tasks from the main user pipeline.
  • Manage and scale background jobs without extensive configuration.
  • Improve your application's performance and responsiveness during heavy loads.

Wrapping Up

In this post, we've explored how to scale your Phoenix application using FLAME. We've delved into how FLAME stands out by offloading tasks to remote nodes. It also offers built-in scaling with minimal boilerplate code. We've compared FLAME with other methods, too, like Task.async, Oban, and external serverless functions, highlighting its unique advantages.

Ultimately, FLAME helps you build more robust and scalable applications by leveraging Elixir's strengths in concurrency and distributed computing, all while keeping your development process straightforward and intuitive.

Happy coding!

P.S. If you'd like to read Elixir Alchemy posts as soon as they get off the press, subscribe to our Elixir Alchemy newsletter and never miss a single post!

💖 💪 🙅 🚩
diwakarsapan
Sapan Diwakar

Posted on September 18, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related