Unflaking tests

I released a video in collaboration with Swedish cloud provider for the Nordics, Glesys. It cover self-hosting analytics with Plausible Analytics.

I have more videos planned but currently not backlog, I've had a repeat cold which has been super annoying. Pre-schools are hellscapes for getting sick and my daughter keeps going there. Regular Programming has been running behind as well. Hopefully we'll catch up on editing as I go on parental leave now in May and my wife can hide in her office.

I'm going to Code BEAM STO 2022, that is the Erlang and Elixir conference in Stockholm. If you are going and you can plan being there a day early for a special event, I have something I'm putting on. If you are curious about either LiveView or Nerves and you want to have fun and make some friends before the conference I have a full day for you. I'm bringing my team and myself. Frank Hunleth has graciously agreed to participate and help everyone do Nerves. Seating is limited, there's a 6 spots left and I really hope we'll see some of you there :)

There are two companies that currently stand out in supporting my work:

The first one is West Arete, a sustainable agency. They are looking for Ruby and Elixir developers in the US.

The second one is Bzzt, a scrappy startup. They are looking for Elixir developers in Sweden.

You can find more information through the links above or by going the Underjord Jobs page. Applying means you and me talk about the opportunity and figure out if it's a good fit and then there'll be the usual shenanigans of interviews and whatnot.

How do you test stateful systems?

This is a continuation of the previous newsletter which was about a particular client and particularly flaky tests.

The big issue underlying the flakiness of the tests is a reliance on a ton of stateful system pieces and that they simply were not designed to be tested. There is a database and a write-through cache. You access the database through the cache almost exclusively. The access calls look similar regardless:

FancyCache.get(:user, "my-id")
FancyDatabase.get(:user, "my-id")

There are also a ton of almost-singleton (one per node) actors that do assorted pieces of work over a PubSub mechanism. This is where it gets fun. We're crossing a lot of process boundaries via message passing to get things done in this system.

These things conspire against performant testing. Between tests you typically want to reset the state of the system to something known. This means resetting the database, resetting the cache, figuring out if any stateful actors are holding now-invalid state.

I've managed to help make these tests much less flaky recently by just resetting more and more state and restarting actors, killing background tasks spawned from tests. They are still quite slow. The have to run synchronously because all of this state is essentially global in nature.

Being able to run tests async and fully concurrently is magic. That would take this code-base down to a few seconds I'm betting. From the 2-3 minutes I'm currently looking at.

So how do we tame the state and make it testable?

Ecto typically achieves this with the Sandbox. The more performant you want your Ecto tests to be and the more Process boundaries you cross, the more annoying it becomes. The defaults are pretty mild and easy to work with but not necessarily optimal. Ecto does it in a balanced way.

The challenge with process boundaries is that they eat up the one useful piece of shared state we can otherwise leverage which is the Process dictionary. It only exists inside the current process and that means the test can't override what is happening in an actor.

Ecto also allows you to pass a specific repo pid into essentially any operation and it can use that as the connection. It's called dynamic repos. It is typically used if you are doing multiple databases or multi-tenancy. It provides enormous flexibility.

So I think that would a key for this system, enable explicit passing of an identifier, anything really, a hash of the test's name or an otherwise unique string. Let the main system start it as "main" if it wants. Just make it possible to effectively override in testing. Currently there just isn't enough information being passed. We would need it look like this:

FancyCache.get(conn, :user, "my-id")
FancyDatabase.get(conn, :user, "my-id")

Where conn could be a simple string prefix, a connection struct, a pid. There's a ton of things it could be. There needs to be something to signify that this is a different thing.

In this case I would say that entire chunks of the Supervision tree would benefit from being possible to run with such an ID and let every test have their own system, namespaced from other tests, when necessary. If everything is isolated teardown becomes trivial. In fact if you start the supervision tree in test setup it would tear down as the test process goes away.

This is why global config only, static GenServer names and other related simplifying practices is often a poor choice in Elixir libraries. For many cases you may want to run the library multiple times with different configurations.

Really, it's all very Functional Programming. Anything implicit or magical and any global values is sort of going against the grain of FP in the name of convenience. Sometimes that's an okay trade-off. In this case we are working on testing and maintaining an increasingly complex and nuanced system and the inability to run parts of the system in isolation is causing immense pain.

Many of the components are hopelessly coupled to each other. That's the core of the system and I don't know that it's worth changing. What would be worthwhile is running multiple cores while testing.

If you have different ideas about how this should be wrangled, I'm all ears :) Let me know at lars@underjord.io or on Twitter where I'm @lawik.

I appreciate you reading this. Thank you for your attention.

- Lars Wikman