It will get worse

The phone queue system is no longer twelve steps of number navigation or bad voice prompts followed by a long wait in a slow queue with frequent disconnects. It is a cyclic graph directing you towards one of two-hundred wrong options using only natural language inputs. The two-hundred-and-first option is to talk to a human which can solve your problem, that's the last resort and you need to argue your way there. The first version of the system was slow because the model was slow. Now it is slow because the hardware is slow. The model har been pruned to be cheaper as time goes by because it could stand to be really clunky and dumb. Now it will just fully mishear every 5 words. Every other word if you have a cool accent.

Enshittification will take any efficiency or interesting tool we can produce and through capitalist incentives optimize for misery and opression where there are no real options. I take the idea that the current batch of hyped AI advancements within Machine Learning will make things worse as a given. That's the incentives and most of the organizations driving the development are very much aligned with that.

Wherever there are massive corporations doing the bad stuff we have cracks in the pavement where the resistance tries to repurpose it to do some good. We also have an all-out arms-race up top right now and open models are raining down as a currency with which to buy relevance in the state of the art. So we've got things we can use.

I am currently in a conversation about local-first AI experiments. The iPhone already lets you select text in images entirely transparently from their rasterized form. That's an ML trick. Pulling a transparent figure out of a picture. ML trick. So we know the phones can pull ML tricks. They have to. That's how cameras work now apparently.

LLMs are still absolutely beastly large but you can run small or crunched up ones locally depending on your hardware. Recently released Llamafile makes this a lot easier. The first L is for Large and it has consequences both in disk space and in required memory for your accelerator of choice (typically a GPU). If I kill enough graphically active applications I should be able to run Mistral 7B on the 3090 Ti I have on this machine. That's a lot more than my phone has though.

Whisper speech to text, smaller variants, should run fine on a smartphone. VITS seems pretty lean as it can run realtime on CPU only. Embeddings and vector search should be perfectly feasible. There are examples where people have gotten heavily crunched versions of Llama2 with llama.cpp to run on-device. There are options and possibilities beyond sending our stuff to the cloud for processing on massive GPU behind the curtain. And it will grow, in interesting ways.

The current app situation is a cesspit of course as it is filled with ChatGPT clones and things that just ship your question to OpenAI APIs. Curated af of course. Apple keeps us super safe. So very safe.

Data ownership has long been the domain of enthusiasts. Self-hosters and really picky users. Local-first apps exist but you need to pick them out carefully and typically the story becomes rather nuanced very quickly as you need accounts and all that. And building them has been difficult. This is a lot of the local-first movements direction. Making local-first straightforward, easy, even preferrable. I think we are still very much in a fight to win ground there.

Local and open ML in the service of the user will be much the same. It has to start with enthusiasts and technically savvy folks and then we build tools and the concept builds mind-share.

Have you ever built anything local-first? What would you use? Have you considered local "AI", what do you think it could do for you? You can reply to this email or poke me on the fedi @lawik@fosstodon.org, I enjoy hearing your thoughts.

Thanks for reading. I appreciate your attention.