Understanding Process Restart Strategies: Transient, Temporary, and Permanent

herminiotorres

Herminio Torres

Posted on September 19, 2023

Understanding Process Restart Strategies: Transient, Temporary, and Permanent

Introduction

In today's building software systems that can gracefully handle failures and maintain uninterrupted operation is crucial. Elixir, a powerful and fault-tolerant programming language, offers a range of strategies for managing processes when they encounter issues. These process restart strategies, including :permanent, :temporary, and :transient, play a pivotal role in ensuring system reliability and resilience. In this guide, we'll explore the concepts and best practices behind these restart strategies, equipping you with the knowledge to design robust and dependable software systems in Elixir.

When do you use which option?

There are three options for :restart:

  1. Use :permanent when:

    • The process is critical for the system's overall operation, and its failure would severely impact the system's functionality.
    • You want the process to be automatically restarted upon failure to maintain system availability.
    • Examples might include database connections, core components, or critical services.
  2. Use :temporary when:

    • The process is not essential for the system's continuous operation, and its failure can be tolerated without significant disruption.
    • You want to minimize automatic restarts for non-critical processes to avoid excessive resource usage.
    • Examples might include non-essential background tasks, loggers, or metrics collectors.
  3. Use :transient when:

    • The process is not critical on its own but is part of a group of processes or a subsystem where consistency among processes is important.
    • You want to ensure that dependent processes are restarted along with the transient process to maintain overall system integrity.
    • Examples might include worker processes in a job processing system or components of a distributed system.

This option allows you to configure the recovery behavior of the GenServer in case of a failure, providing greater control over the system. In summary, the choice of restart strategy depends on the criticality of the process to your system and its dependencies. Careful consideration of these factors helps you design robust and resilient systems that can recover from failures effectively while avoiding unnecessary restarts for non-critical components.

Using the :restart Option

To utilize the restart strategies for a process in Elixir, you typically need to work within a supervision tree, which is a hierarchical structure used to manage and supervise processes. Here's how you can use the different restart strategies (:permanent, :temporary, and :transient):

  1. Creating a Supervisor:

    • First, you need to create a supervisor using the Supervisor module. You can use Supervisor.start_link/2 or Supervisor.child_spec/2 to configure the supervisor.
  2. Adding Child Processes:

    • You then add child processes (which can include GenServer processes) to the supervisor's supervision tree using the Supervisor.child_spec/2 function. In the child specification, you can specify the :restart strategy.
  3. Defining Restart Strategies:

    • In the child specification, you specify the :restart strategy. You can set it to :permanent, :temporary, or :transient, depending on how you want the process to be restarted in case of failures.

Here's a code example of how you might configure a supervisor with different restart strategies:

defmodule Dummy.Application do
  # See https://hexdocs.pm/elixir/Application.html
  # for more information on OTP Applications
  @moduledoc false

  use Application

  @impl true
  def start(_type, _args) do
    children = [
      # Starts a worker by calling: Dummy.Worker.start_link(arg)
      %{
        id: Dummy.Permanent,
        start: {Dummy.Permanent, :start_link, [[]]},
        restart: :permanent,
        type: :worker
      },
      %{
        id: Dummy.Temporary,
        start: {Dummy.Temporary, :start_link, [[]]},
        restart: :temporary,
        type: :worker
      },
      %{
        id: Dummy.Transient,
        start: {Dummy.Transient, :start_link, [[]]},
        restart: :transient,
        type: :worker
      }
    ]

    # See https://hexdocs.pm/elixir/Supervisor.html
    # for other strategies and supported options
    opts = [
      strategy: :one_for_one,
      name: Dummy.Supervisor
    ]

    Supervisor.start_link(children, opts)
  end
end
Enter fullscreen mode Exit fullscreen mode

Dummy.Permanent:

defmodule Dummy.Permanent do
  use GenServer

  def start_link(_) do
    GenServer.start_link(__MODULE__, :ok)
  end

  def init(:ok) do
    {:ok, :initial_state}
  end
end
Enter fullscreen mode Exit fullscreen mode

Dummy.Temporary:

defmodule Dummy.Temporary do
  use GenServer

  def start_link(_) do
    GenServer.start_link(__MODULE__, :ok)
  end

  def init(:ok) do
    {:ok, :initial_state}
  end
end
Enter fullscreen mode Exit fullscreen mode

Dummy.Transient:

defmodule Dummy.Transient do
  use GenServer

  def start_link(_) do
    GenServer.start_link(__MODULE__, :ok)
  end

  def init(:ok) do
    {:ok, :initial_state}
  end
end
Enter fullscreen mode Exit fullscreen mode

Let's start the IEx and get the app running.

$ iex -S mix
Enter fullscreen mode Exit fullscreen mode

If you take a look, we check the count of children in the Dummy.Supervisor. The supervisor started, and the children too. Getting the pid and pipe to Process.exit/2 with the :kill reason. Check the Dummy.Supervisor again. Supervisor children's only rest is Permanent and Transient.

$ iex -S mix
iex> Supervisor.count_children(Dummy.Supervisor)
%{active: 3, workers: 3, supervisors: 0, specs: 3}
iex> Supervisor.which_children(Dummy.Supervisor)
[
  {Dummy.Permanent, #PID<0.129.0>, :worker, [Dummy.Permanent]},
  {Dummy.Temporary, #PID<0.128.0>, :worker, [Dummy.Temporary]},
  {Dummy.Transient, #PID<0.127.0>, :worker, [Dummy.Transient]}
]
iex> pid("0.128.0") |> Process.exit(:kill)
true
iex> Supervisor.count_children(Dummy.Supervisor)
%{active: 2, workers: 2, supervisors: 0, specs: 2}
iex> Supervisor.which_children(Dummy.Supervisor)
[
  {Dummy.Permanent, #PID<0.129.0>, :worker, [Dummy.Permanent]},
  {Dummy.Transient, #PID<0.127.0>, :worker, [Dummy.Transient]}
]
Enter fullscreen mode Exit fullscreen mode

For this next one, it will be a bit bigger, but bear with me. In this next one, I will play along with the Transient process. If I use any reason to kill the Transient process, he will come back alive again. But if the reason is :shutdown or {:shutdown, term}, the process will not get back alive, and in this situation, we can restart the proccess manually using Supervisor.restart_child/2. And if I kill the temporary and try to restart the process manually, I get a tuple with {:error, :not_found}. One important thing here is that I've been using the child ID.

iex> Supervisor.count_children(Dummy.Supervisor)
%{active: 3, workers: 3, supervisors: 0, specs: 3}
iex> Supervisor.which_children(Dummy.Supervisor)
[
  {Dummy.Permanent, #PID<0.129.0>, :worker, [Dummy.Permanent]},
  {Dummy.Temporary, #PID<0.128.0>, :worker, [Dummy.Temporary]},
  {Dummy.Transient, #PID<0.127.0>, :worker, [Dummy.Transient]}
]
iex> pid("0.127.0") |> Process.exit(:kill)
true
iex> Supervisor.count_children(Dummy.Supervisor)
%{active: 3, workers: 3, supervisors: 0, specs: 3}
iex> Supervisor.which_children(Dummy.Supervisor)
[
  {Dummy.Permanent, #PID<0.129.0>, :worker, [Dummy.Permanent]},
  {Dummy.Temporary, #PID<0.128.0>, :worker, [Dummy.Temporary]},
  {Dummy.Transient, #PID<0.145.0>, :worker, [Dummy.Transient]}
]
iex> pid("0.145.0") |> Process.exit(:normal)
true
iex> Supervisor.count_children(Dummy.Supervisor)
%{active: 3, workers: 3, supervisors: 0, specs: 3}
iex> Supervisor.which_children(Dummy.Supervisor)
[
  {Dummy.Permanent, #PID<0.129.0>, :worker, [Dummy.Permanent]},
  {Dummy.Temporary, #PID<0.128.0>, :worker, [Dummy.Temporary]},
  {Dummy.Transient, #PID<0.145.0>, :worker, [Dummy.Transient]}
]
iex> pid("0.145.0") |> Process.exit(:shutdown)
true
iex> Supervisor.count_children(Dummy.Supervisor)
%{active: 2, workers: 3, supervisors: 0, specs: 3}
iex> Supervisor.which_children(Dummy.Supervisor)
[
  {Dummy.Permanent, #PID<0.129.0>, :worker, [Dummy.Permanent]},
  {Dummy.Temporary, #PID<0.128.0>, :worker, [Dummy.Temporary]},
  {Dummy.Transient, :undefined, :worker, [Dummy.Transient]}
]
iex> Supervisor.restart_child(Dummy.Supervisor, Dummy.Transient)
{:ok, #PID<0.146.0>}
iex> Supervisor.count_children(Dummy.Supervisor)
%{active: 3, workers: 3, supervisors: 0, specs: 3}
iex> Supervisor.which_children(Dummy.Supervisor)
[
  {Dummy.Permanent, #PID<0.129.0>, :worker, [Dummy.Permanent]},
  {Dummy.Temporary, #PID<0.128.0>, :worker, [Dummy.Temporary]},
  {Dummy.Transient, #PID<0.146.0>, :worker, [Dummy.Transient]}
]
iex> pid("0.128.0") |> Process.exit(:shutdown)
true
iex> Supervisor.which_children(Dummy.Supervisor)
[
  {Dummy.Permanent, #PID<0.129.0>, :worker, [Dummy.Permanent]},
  {Dummy.Transient, #PID<0.146.0>, :worker, [Dummy.Transient]}
]
iex> Supervisor.restart_child(Dummy.Supervisor, Dummy.Temporary)
{:error, :not_found}
Enter fullscreen mode Exit fullscreen mode

For the last and not least, When I play with the Permanent and try to kill using any kind of reason, the process always gets me back alive.

iex> Supervisor.count_children(Dummy.Supervisor)
%{active: 3, workers: 3, supervisors: 0, specs: 3}
iex> Supervisor.which_children(Dummy.Supervisor)
[
  {Dummy.Permanent, #PID<0.129.0>, :worker, [Dummy.Permanent]},
  {Dummy.Temporary, #PID<0.128.0>, :worker, [Dummy.Temporary]},
  {Dummy.Transient, #PID<0.127.0>, :worker, [Dummy.Transient]}
]
iex> pid("0.129.0") |> Process.exit(:shutdown)
true
iex> Supervisor.which_children(Dummy.Supervisor)
[
  {Dummy.Permanent, #PID<0.145.0>, :worker, [Dummy.Permanent]},
  {Dummy.Temporary, #PID<0.128.0>, :worker, [Dummy.Temporary]},
  {Dummy.Transient, #PID<0.127.0>, :worker, [Dummy.Transient]}
]
Enter fullscreen mode Exit fullscreen mode

Conclusion

In conclusion, process restart strategies in Elixir are indispensable tools for crafting software systems that can provide uninterrupted services. By applying strategies like :permanent, :temporary, and :transient. These strategies empower us to build resilient systems that recover gracefully from failures, ensuring a smoother and more reliable experience for end-users. As you continue to explore the world of Elixir and keep these restart strategies in your toolkit.

Reference

💖 💪 🙅 🚩
herminiotorres
Herminio Torres

Posted on September 19, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related