Supervision trees, an example in Elixir

2020-10-21

Underjord is an artisanal consultancy doing consulting in Elixir, Nerves with an accidental speciality in marketing and outreach. If you like the writing you should really try the pro version.

So any time recently that I’ve gone looking for a good overview of supervision trees in Elixir I haven’t found what I want. I’m pretty sure I used to find some that covered making simple supervisors and workers without assuming you want a module for each Supervisor. I now believe those were following ye olde Supervisor.Spec which had helpers for that. How to make a module based Supervisor is in the docs so I won’t be spending time on that.

Since that method was deprecated I figured it was time I bite the bullet and get comfortable with child specs and the way they work and figure out if I can avoid creating a module for normal use of a simple bog standard Supervisor. Spoiler: I could.

This repo has all the code for what I built. So if we dive into lib/supervisor_sample/application.ex we find the following:

elixir

  # ..
  def start(_type, _args) do
    children = [
      worker(:root_worker),
      supervisor(
        :one_for_one,
        [
          worker(:worker_1),
          worker(:worker_2)
        ],
        name: :supervisor_1
      ),
      supervisor(
        :rest_for_one,
        [
          worker(:worker_3),
          worker(:worker_4),
          worker(:worker_5),
          supervisor(
            :one_for_one,
            [worker(:subworker_1)],
            name: :subsupervisor_1
          )
        ],
        name: :supervisor_2
      ),
      supervisor(
        :one_for_all,
        [
          worker(:worker_6),
          worker(:worker_7),
          worker(:worker_8)
        ],
        name: :supervisor_3
      ),
      worker(:transient_root_worker, :transient)
    ]

    # The root of the tree is a supervisor that runs everything we defined above
    opts = [strategy: :one_for_one, name: SupervisorSample.Supervisor]
    Supervisor.start_link(children, opts)
  end

  # ..

GitHub link

This is my supervision tree definition. It uses utility functions supervisor and worker to make it easier to get an overview. These functions generate a child spec with each call. Even with some experience using OTP I never really spent any time understanding child specs. I won’t go too deep into them here, the docs above honestly cover it but I’ll try to make it digestible in my own way.

This is what the supervisor child spec looks like if I call supervisor(:one_for_one, [], name: :my_supervisor):

elixir

%{
  id: -576460752303423326,
  start: {Supervisor, :start_link,
   [[], [strategy: :one_for_one, name: :my_supervisor]]}
}

So the :id is fairly arbitrary. According to my sources, I asked on Twitter, more knowledgeable people responded, it is used for restarts and whatnot. It can also be used when doing interesting things with your supervisor implementation. But it is not important aside from needing to be unique, unless you have specific plans for it.

The :start key gives what we are actually starting. The format might become familiar to you. A tuple with a module atom, a function atom and a list of args to pass into the function. This matches the signature of apply/3. In this case the args are a list of children and some options. Because that is what the Supervisor module takes for the function start_link.

That’s all that is necessary. The child spec docs cover the other options. We can check our worker example as well:

elixir

%{
  id: -576460752303423294,
  restart: :permanent,
  start: {SupervisorSample.Worker, :start_link,
   [[label: :my_worker, name: :my_worker]]}
}

Here we also use the :restart key because I have an example with :transient and so I set it explicitly.

These maps aren’t complicated to create and they shouldn’t be intimidating. But they are visually bulky and I think there are many ways of building the tree that could be done in a visually pleasing and less noisy way. That’s what I use the utility functions in the sample project for.

GenServer and child_spec/1

In many cases you don’t actually have to create the child spec yourself. Anything that is a GenServer will have a child_spec/1 already included. So then we can reduce the above to {SupervisorSample.Worker, name: :my_worker}. Or without a name it could be SupervisorSample.Worker. Very clean. A bit of convention saving you a bunch of repetitive detail. But the Supervisor module doesn’t offer child_spec/1. It offers child_spec/2 which is used to modify child_specs for module’s that already have them. Usually because you want to override something in the default child spec. Such as the :id.

Most libraries you’d use where you need to start an instance of them as part of your supervision tree would already provide you with a child_spec. If they don’t, you can create one yourself quite easily just as we did for Supervisor.

Another thing you can do is create a module that provides a child_spec/1 for starting a supervisor as detailed in this converstation on the forum. The code is partial, but I think the idea is complete. Then you could use that module instead of Supervisor.

The tests & strategies

So the supervision tree above showcases the different strategies available. It also shows that we can supervise a supervisor, that’s how you build a bigger tree with processes that depend on one another.

To demonstrate how these work I’ll direct you to test/supervisor_sample_test.exs. Every test looks something like this:

elixir

  # ..
  test "restart root worker" do
    Worker.stop(:root_worker)

    # Should restart
    assert_receive {:stopped, :root_worker}
    assert_receive {:started, :root_worker}
    # Shouldn't restart anything else
    refute_received {:stopped, _}
    refute_received {:started, _}
  end
  # ..

They follow this model of, okay, let’s tell a worker process to stop and assert that we receive the messages we expect. I expect this one to be stopped and then restarted. These messages are sent in lib/supervisor_Sample/worker.ex and we register for listening in the setup hook in the test module.

The different strategies are succinctly explained in the Elixir docs. I’ll briefly restate it here:

:one_for_one, the supervisor will restart each child separately if they terminate. Only if the supervisor goes down, does it affect the whole group.
:one_for_all, if a single child process terminates the whole set of child processes will be restarted.
:rest_for_one, this one is interesting. If a child process terminates all the children started after it in the list will be restarted. This might seem odd but has some uses.

So you can look at the tests to see examples of these behaviors.

This doesn’t really cover DynamicSupervisor at all. But that has a lot of its own considerations. It is very useful and maybe I should cover it at some point. Alex Koutmos one of my co-hosts on Elixir Mix has a good piece on DynamicSupervisor.

I hope this is helpful to people. Thank you for your attention.

If you have questions, thoughts or more of a comment, really, you can find me on twitter {{ lars_twitter }} or reach me via email {{ lars_email }}.

Underjord is an artisanal consultancy doing consulting in Elixir, Nerves with an accidental speciality in marketing and outreach. If you like the writing you should really try the pro version.

Note: Or try the videos on the YouTube channel.