Claude Code, MCPs and the promised future

No images? Click here

I keep getting nuggets of delight about the previous Goatmire. During the Stockholm BEAM Meetup, speaker Kristoffer Grönlund spoke about his Markdown Parser "mörk" he made in/for Gleam. He called out the conference as a part in driving him towards .. well Elixir and then he missed :D it's fine.

I'll also share, without permission because it is harmless something an EEF board member shared with me from that same event:

Goatmire really made an impact. I spoke to one guy I haven't met before who proudly wore his Goatmire t-shirt. I asked him how it was and he was like "you think people are exaggerating when they say how great it was, but it really was that great".

I could not be more pleased.

Goatmire Elixir 2026 website

We are doing Goatmire Elixir 2026.

30th of Sep - 2nd of Oct. This year. Varberg, Sweden.

I'll be writing up the first workshops shortly to make them ready to announce.

 

Cognitive dissonance as a service

Been building a small focused-purpose web service for a client. They use LLMs fairly heavily so I took the opportunity to do what I think would work well in that regard. It has really worked. Turnaround has been quick. The solution seems perfectly maintainable. And I have very mixed feelings about it all.

I have a lot of skepticism about the hype for LLMs. The mix of Oxide & Friends' mostly reasonable people using them with good outcomes and reasonable people all around my community getting between reasonable and outlandish results with them has made me feel I need to understand it.

I'm an idealist and I prefer to take a harder more opinionated line. I'm vegan for crying out loud. Saying no to tempting things on principle is how I get off. I don't trust the companies behind these things. I don't trust the larger system that invests in and supports it. The way they've been created are deeply unethical if you don't want to take information freedom waaay further than most people are comfortable with. It is all ridiculous. But society has currently decided to largely ignore that. And the things exist and they do things. I'm very torn about using them at all.

The reason I do use and explore them is that being uncurious has only ever bit me. "I don't care about X at all" or "your thing is irrelevant to me" has been an attitude I've sometimes adopted about particular things. I'm still selective in what I care about but being dismissive and lacking curiosity has only ever bit me. People get cool shit done all the time using tool choices I disagree with. Consequently. I apply a more open mind than I feel like and try things.

Personally I've had good use out of Claude 4 and up for capturing the very spread out knowledge of Linux kernel config and subsystems and handing chunks of it to me. It very often can point me at the exact thing I need. Just  chat.

I also used Zed's sidebar AI a few times with very mixed results to try and create various things I kind of wanted but didn't want to put effort into. This was prior to Claude 4.5. The best outcome was when I very step-by-step made it build out an Ash application that I scaffolded (with one call to Igniter). The app tracks important information about speakers and will probably be used to make sure we have the right info earlier for the next Goatmire.

Snap to present time. I built out this service similarly. Ash, and build the domain logic first. Make sure that is well covered in tests. This time I used Claude Code more and with Opus 4.5 it really does a lot better. Instead of making it not go completely off the rails I occasionally needed to tell it to not shortcut access checks. But Ash as a system structure makes this stuff so obvious. You read very few lines of code that have a lot of meaning.

Claude 4.5, particularly the big Opus thing but also the smaller Sonnet, are really a step up compared to past experiences. I rarely have to make it start over. And someone suggested I use Tidewave MCP (not the full app, unless I'm doing web UI, just the MCP) to get tools like project_eval which lets the agent run snippets of code. Which makes it way easier to do small changes in a REPL style. It really works well.

If you are following the kind of cool and delightful (user in "control", power-user type tool, quirky, fun, experimental) explosion of ClawdBot/MoltBot/OpenClaw you will hear people talk about absurd things they are able to do. I would not run it without inventing a completely new identity for myself that it can play within. Or if you have people trying random challenging things around you. Like making a rendering backend for Scenic using Skia, in Rust, without writing almost any code. Or things that aren't worth spending the time they'd take for a human like this bash interpreter written in pure Elixir.

Lots of my work recently is careful, meticulous lining up of byte offsets and making sure secure boot is secure and booting (at the same time, very annoying). Or making a migration from one embedded OS to Nerves work right. All of this tends to be by hand. The bot can't unplug my USB cable, put the probe on a test point and replug to then flash using an obscure GUI. I'm special damnit.

Also depends on my client's attitude. Most importantly it is work that doesn't just need to happen and work well enough. It needs to be right enough that we aren't painted in a corner 5 years into maintaining it. You have way more leeway to rip out and redo in a web service than on embedded devices. Data migrations are terrifying on dodgy connections, hardware that can fail, storage mediums that hate being written to.

That said. The power of Tidewave's relatively simple `project_eval` tool made me think of a `device_eval` to do hardware-in-the-loop development with an agent. I made a prototype. It is WILD that I can just type at the thing. "Check if the connected device's clock is in sync." And it does. Knowing how decent it is at binary wrangling in Elixir I could definitely ask it "What partition table format is /dev/mmcblk0?" which was a thing that came up recently.

There are a lot of potential and actual downsides to using these things.

You end up relying on reviewing. You can usually skim code and see if it is reasonable assuming the surrounding project isn't too chaotic. And then the devil is in the tests. I've found myself multiple time not looking enough at the tests. They really write better tests than me a lot of the time. They have "more patience" / feel no pain so they can be exhaustive. But I've still missed things because the process puts me in a lazy mode.

If you use one of these on the train you better be running it under SSH + tmux on another machine. They want an internet connection baybeee. Very annoying under bad internet or when Anthropic falls over. Coupling your work process to internet in this way is insidious to me. I do use the internet a lot when I work but I usually don't have to completely switch approach because the internet blips.

It writes a lot of docs that you didn't read. They are very fastidious about docs and examples and stuff. And it becomes noise to me. It'll slip right through my brain unless I do an actual dedicated documentation pass.

No one knows the licensing situation. I assume the world don't give a fuck and will just plow on. But oof.

Anyway. I'm doing more stuff with these things. Very mixed feelings. I think several bubbles will pop but I don't think these go away. I really hope we get open models matching Opus 4.5 that can run on a civilian appliance box. So we get away from the cloud situation. I hope this will force companies to have more granular security practices and more robust failure modes on smart home, cloude services and more so that people can cordon their bots properly. Because I have a hard time seeing that this stuff won't end up everywhere. And we'll need to manage that some damn way.

I imagine you either think this is too willing to engage with the LLM stuff or not optimistic enough. I'd be curious to hear where you are on all this. I'm here on email or on Bluesky and the Fediverse, easy to reach.

Thank you for reading. I appreciate it.

 
 

This is an email from Underjord, a swedish consultancy run by Lars Wikman.

Everything else is found at underjord.io

You signed up for this newsletter and confirmed the subscription. If you want to stop receiving it. Just use the link below.

Preferences  |  Unsubscribe