Flaky tests

There's a video coming up soon. I've been down with a cold so I haven't been on the ball with regards to getting it up and out there unfortunately. Hack, cough, wheeze.

There are two companies that currently stand out in supporting my work:

The first one is West Arete. They are a values-first agency doing custom software for higher education and they tapped me for helping them find Elixir developers for their transition from mostly Ruby over into Elixir. They are hiring Remote but only in the US for particular practical reasons. Every US dev I've shown the job description to has raised eyebrows about the benefits and general posture of the company. They are unusual in a good way.

The second one is Bzzt. They are a swedish startup working with electrical vehicles, transportation and trying to change how last-mile deliveries happen in cities. I like this company enough that I've sent them two of my good friends to work there but they need more and I'm running out of available developer friends. If you are looking to make your bones in software, have a big impact on a small team and generally learn and grow a lot I think this is a good place for it.

You can find more information through the links above or by going the Underjord Jobs page. Applying means you and me talk about the opportunity and figure out if it's a good fit and then there'll be the usual shenanigans of interviews and whatnot.

How do you know if you fixed it?

A client I'm working with has a flaky test suite. Not through some deep sin of their own, the software has just had a bumpy road. Some of it is almost definitely the fault of past me as well.

When I first encountered it the software had a skeleton crew and the test suite didn't run. As a side-dish to the project I did for them I repaired the test suite. A big reason for this was that they were entirely too busy to check my work and knowing this I wanted to build some clear indicators that I had implemented functionality as agreed. The tests became a form of documentation and verification. They became my buddy in verifying that I was covering all intended cases.

I've been working off and on with the client.

Unfortunately the test-suite as a whole is flaky, there are many time-elements and too much global state floating around the application for the tests to really account for it in a good way.

I'm not a 100% coverage unit test kind of developer. For most of my career I've barely been any kind of test-writing developer. I prefer having tests these days but I'm not precious about having all the tests.

What is very clear is that when I'm building a system where I can't reasonable click my way around and check that it does what I intended I very much want to be able to run tests. Also, when I'm adding or modifying existing functionality, I really want that code path to have some test coverage. In this particular application there isn't a UI for what I'm doing, there's an app that's fairly hard to run for development.

This is where flaky tests get costly. If I know the tests need to be run 5-6 times before passing it gets really hard to know if I've caused a real problem. It starts to build a disregard for the test suite and was probably a reason why it was broken originally.

So how did I, in the past, contribute to this flaky behavior? Well, I had to implement som very time-focused code and while I tried my darndest to control the time-element to make things testable I also likely had some hubris about exactly how much of a stateful system I could control. So I tried to control the clock. "The most powerful magic of all is .. Chronomancy."

(or the talk Why time is evil in distributed systems)

So yeah, time is the breath of the demiurge. Watch out.

The idea was decent, the devil is in the implementation and how it by necessity interacts with a message-passing system full of actors. I'm honestly not sure whether I made it worse or if the issues I see with my old tests are roughly par for the course with the rest of what was there. I know my module for controlling time in testing is far from perfect.

What I achieved was that I could test my code and get a decent sense that the intended behavior was there. But due to how trying to move time in a system that triggers messages, and messages don't happen instantly, and further you don't know what messages would happen if you moved time faster or slower, you really end up with something finicky and hard to understand. It becomes fragile in practice.

I think I even managed to put some things that hadn't been under test, actually under test, with this solution. I would certainly not have been able to deliver what I did with good confidence without implementing a way to test my work and test it in the context of the surrounding system.

This is something worth getting used to in Elixir. If you are dealing with a lot of GenServers, databases, messaging and event-driven things. Spend time on figuring out how to test a number of parts working together but in isolation.

Being able to run the event system in relative isolation would make it possible for each test or test suite to run essentially a clean copy of the system and the tests would be easy to make consistent.

My current work with this client is not about the test suite. I keep going back to look at it though because I know how good a tool it can be and for this software system especially, automated tests is a necessary way of making sure the implementation does what was intended.

I'm confident there is a path to a stable test suite for this client. I think it'll be a fair bit of effort. And I think it will pay off if they decide to do it.

This highlights a change in my view of tests. I care more about them. It has also highlighted how big of a problem I consider flaky tests. It makes me dismiss the test suite entirely. It also underlines the old wisdom about slow tests. It becomes another reason to not work with the tests very much.

It is very easy to end up in this spot, especially when your code is in a pressure-cooker or has changed hands a few times. It's not a moral failing to have flaky tests, we all struggle in an imperfect world, but it can be costly not to address them.

Have you fixed a flaky test recently? Let me know at lars@underjord.io or on Twitter where I'm @lawik.

I appreciate you reading this. Thank you for your attention.

- Lars Wikman