|
speed is a special quality No images? Click here ![]() Gotta go fastI don't recall where I heard it but there is a saying that I'm failing to source: "Speed is a quality all on it's own" This rings true. New clients respond very well to quick turnaround of initial work. Fast software can feel nearly magical in nature. Most modern appliances kind of suck because they take time to turn on. One of my clients were taking a look at NervesCloud (our hosted, managed NervesHub) and he was pretty impressed with what I had showed him and got going with it by himself. But he mentioned it being slow. He said it was slow overall which I couldn't reproduce but what I could see was that one of the main views. The device listing, which you'll often move through, was slow. Even with only a few devices. This gnawed on me in the background. I didn't have a chance to look at it then and there beyond making sure our system was overall performant. But it stuck with me. And when I was working on my own devices. I was very annoyed that it took more than a second to list my single device. There is no substitute for needing to user your own product. This is probably why all school platforms suck more or less. If you don't need to spend a full day in your product trying to get stuff done. How are you going to know the pain. Well. I suppose you can listen to users but I don't think that's nearly as effective because they don't know what could be done or where the problems are. Josh Kalderimis, my lovely co-founder, had added a loading state because the view was slow. So that mitigated the sense of waiting but it didn't speed up the work. After a full day of fiddling with my devices at home I was pretty annoyed that this thing we run would behave this poorly. My assumption was that the device listing query had gotten so beefy it was slow. I added some debug functionality to a module and pasted it into a production REPL. I'd done some tracing to time a few functions I had hesitations about. The device filtering was fine. There was either something broken about LiveView Asycn or there was a completely different thing being slow. We make a few other queries to populate the filters you can use to slice and dice the device list. My debug function for finding slow queries flagged a particular query for fetching what Alarms are available to filter on. It took 1+ second to complete. We should certainly improve that query. But we can also wait to do that query until after we've renderred the critical UI. The filter drawer doesn't even show up until the user clicks it. We can fetch that after render. This is the Pull Request. It isn't rocket surgery but it did an immense improvement to the hottest path in our UI. I think we all just thought it was slow. SmartRent recently deployed this, they have more than 100K devices (publicly traded company so they can't share numbers willy-nilly). But their render speed also went to nothing. The device filter query is very fast by default. The extra queries are now deferred. I'm completely unashamed of the device listing again. It was very easy for this to sneak in. The filter query isn't slow on a typical dev machine because that doesn't have a lot of data in alarms that it needs to process. So it's fast. When I was fixing it I added a 1 second sleep to the the alarm query just to get the behavior I was seeing. An end note of BEAM love. We ship Recon by default. You should too. It is a very reasonable interface for basic tracing (though I would have preferred to have my own entrace). But even without Recon you can do a ton of introspecting on a running BEAM. I would love it if we had a perfectly instrumented observability setup. We have some stuff but we don't have the (costly) in-depth stuff I've used from DataDog or NewRelic. So I couldn't just see the slow query in some UI. But I could actually trigger it fairly trivially. My debug function for slow queries, linked above, will attach telemetry handlers for Ecto for a set number of seconds. It will inspect the queries that happen during that time and I can slice and dice that however I want. High-level ad-hoc introspection of the system at runtime is a beautiful thing. What is a satisfying improvement you made recently? An itch thoroughly scratched or a problem untangled. Maybe you have a beautiful graph to show for it? Thank you for reading. I appreciate it. |