Unpacking Elixir: Real-time & Latency

2023-09-08

Underjord is an artisanal consultancy doing consulting in Elixir, Nerves with an accidental speciality in marketing and outreach. If you like the writing you should really try the pro version.

Elixir was built on Erlang. Erlang was built to provide “consistently low latency” and a few other audacious goals. Note, this is not a hard realtime constraint. It is soft, squishy and yet, important and real. It makes Erlang unusually suitable to systems where latency matters and where a near-realtime experience is necessary.

Soft realtime. This part of the Unpacking Elixir series will really benefit from having read Unpacking Elixir - Concurrency as that post covers a lot about how concurrency works in Erlang and with that it covers Processes and Schedulers. This is mostly about Erlang and the historic choices made in developing the BEAM virtual machine. In the end we will get into how Elixir leverages it in the current landscape but Elixir didn’t create this capability.

Erlang’s claim for consistently low latency predates the multi-core concurrency approach a fair bit. From my understanding it ran a single scheduler in the olden times and the pre-empting was there to ensure this fair distribution of computational resources. It would prevent single pieces of CPU-intensive work from holding up concurrent, faster, pieces of work. It ensures progress is made on all work in short order. No piece of work should be waiting long for processing. A heavy piece of work will also eventually resolve.

Another interesting advantage of this design is that an accidentally infinite piece of work will not necessarily disrupt the system significantly. It will waste resources but will not block all progress.

Erlang was built for telecom and the work was, again from my understanding, about routing phone-calls without introducing noticeable delay preventing any particular misbehaving process from impacting other processes too heavily. In a way it was always about the quality of user experience. Performance often is.

It is a very high-level and dynamic system which is unusual when talking about performance. Why? By being a higher abstraction level, being very dynamic and working with immutable data structures as a Functional Programming style language Erlang leaves a lot of performance on the table in the service of other objectives. We won’t go into those other objectives now but Wikipedia lists distributed, fault-tolerant, highly available and hot swapping as additional traits beyond the soft real-time. I did say Erlang had audacious goals.

So soft real-time or “consistently low latency” is one of the objectives. And it is a performance objective. This is an interesting quirk of design. Designing for the absolutely lowest latency would require a very different approach. It would complicate or even sacrifice other objectives. It might be more costly, complicated and rigid or it might be significantly limited in capabilities. Erlang is not a one-trick pony.

Rather than the absolute lowest latency they went for consistently low latency, with tolerance for deviations and acceptance of imperfect results. They created a system that has been spoken of with some reverence in Computer Science ever since. It is not because it has the lowest latency. I am quite certain it does not. It was only one of the objectives after all.

So we have a cool high-level language, with nice abstractions, that offers low latency. What did people do with it? We mostly hear about chat and messaging because it does do that very well. The popular open source messaging queue RabbitMQ is built on Erlang. The Jabber/XMPP ejabberd software was a popular FOSS chat server. Seemingly that ended up in use at Facebook for their chat. Actually, you will find Erlang or Elixir driving chat in a ton of computer and video games. League of Legends by Riot Games comes to mind. Also beyond messaging but still in games is the heavy use of Erlang in Demonware which powers some pretty high-profile games. Like CoDBlops.

Even more famously Whatsapp built on Erlang. And Discord built on Elixir. These are two massive players in instant messaging that are deeply invested in the BEAM virtual machine. They are applications where latency matters. Not as life-and-death, not to hit the deadline for correctly driving an electron beam or anything quite so sensitive. Rather for providing a good experience to users. It doesn’t matter if there is small variance in latency. It matters that you typically avoid a large variance.

Onwards to what we can use it for. Elixir got the Phoenix web framework. Initially it was just a very responsive web framework. Especially as it pre-compiled templates in a nice way so they render really fast and it was already quick to respond due to Erlang. Even under pressure, latencies would be good. Great stuff but nothing shocking.

One of the early bigger things from that was Phoenix Channels which is a bit of abstraction on top of Web Sockets. This allowed a Single-Page Application or similar to talk to a server that could keep state, that could wrangle messaging, database communication and also was good at broadcasting to other connected clients in a really simple way. Importantly, it didn’t suck. A lot of them did back then. The latency was great, it scaled well. See The Road to 2 Million Websocket Connections in Phoenix for more from that era. This was in 2015.

The next big thing has been LiveView. Which came out with a v0.1.0 in 2019. This is again where that latency being a user experience thing comes in to play. Phoenix LiveView allows you to write Elixir instead of JavaScript frontend code to a very large extent. Components, interactivity and all you really want from a frontend framework. You only write Elixir and your state lives on one side of the connection. The server. This obviates the need for writing API and contract code.

It does require that connection and to be a great experience it requires that latency stays low. You can draw a straight line between this need for low latency and the fact that the creator of LiveView, Chris McCord, works at Fly.io who famously want to move your application closer to your user. That aside. LiveView lets you do a lot with a little. It has been copied by most major web framework ecosystems at this point. What they can’t copy is the BEAM runtime and as such they can’t quite get the same deal. Python and Ruby are both fighting to sidestep their GIL problems. Node.js is very sensitive to blocking the event loop and tanking latency. Livewire for PHP decided it didn’t want to bother holding state and consequently can’t do quite the same things as well.

LiveView is not a panacea. Not a silver bullet. Not the solution for every problem.

In that way LiveView is very much Erlang. It solves a solid set of problems people actually have with abstractions that make those problems trivial to work with. It seems unconcerned with a theoretical ideal and gets on with doing the work.

And it is quite quick about it.

Have any latency horror stories to share? Notes, questions or concerns about this thing I wrote? Feel free to reach out through @lawik@hachyderm.io or on email lars@underjord.io. Have a good one.

Underjord is an artisanal consultancy doing consulting in Elixir, Nerves with an accidental speciality in marketing and outreach. If you like the writing you should really try the pro version.

Note: Or try the videos on the YouTube channel.