Why do ML on the Erlang VM?2023-06-09
Another question might be; why do machine learning at all? I’m not big into ML/AI though I’ve been poking it more recently as the open models, such as Stable Diffusion and the more practical Whisper, caught my curiosity. I have a very decent GPU (3090 Ti) and I’ve poked around a bit. I don’t consider ML a super exciting solution to all the problems or a harbinger of general artificial intelligence. There have been some impressive things done recently for sure. There have also been some honestly grotesque overreach with regards to rights and consent. I’m not your AI hype man.
Where I do appreciate it is in very utilitarian things. Whisper is a good example. Live captions and audio-to-text-transcription are both quite labor-intensive. Considering how much audio is being produced in an ongoing fashion there is no chance to make that more accessible and searchable, by hand. Here this model can clearly be a good thing. It can also be used for ill but that’s mostly the nature of practical tools. Similarly I’ve worked with a startup that processes invoices, receipts and such with ML models to turn the structured but very inconsistent human de facto standard of emailing PDFs to each other into structured data. Understanding and more recently producing data in messy formats, that’s where ML seems to beat writing code.
I am genuinely excited that with the Nx project and it’s descendants this capability is available in Elixir. The Bumblebee project really drives home that you can use this now. I like Elixir and want to see it become more and more capable. For many simpler models it may now be easy enough to just load the model and do inference in your Elixir application instead of needing to offload it elsewhere. And also, without needing to rely on some other party offering an API.
Brief aside, I like it when I can run things myself. It is a good reassuring option to have even when you do rely on a managed provider. It can also be quite the requirement here in the EU with the GDPR and Schrems I-III tearing through what can and can’t be done with data. You can often get more polished results with proprietary services but they are not a cure-all and if you rely heavily on them your are taking on a lot of risk.
Is ML in Elixir really a good idea?
Not in Elixir itself. No. Probably not. But Nx really doesn’t just execute models in Elixir. It is a bit more involved, we’ll get there. For now, let’s talk about Python.
Python is the default for Machine Learning projects. It brings some good stuff to the table.
- High level language, expressive and productive, few low-level concerns
- Approachable language, friendly syntax and not heavy on syntax or tricky concepts
- Ubiquitous, it is everywhere
- Fun to write expressive libraries in
Doing ML in Python also has some drawbacks. Most of these are general to Python rather than specific to ML and also recur in similar languages like Ruby. I rather like Python, wrote it professionally for a number of years. That is to say, there is some love for the language while I’m writing about these problems.
- Efficiency. It can be dog slow and doing any kind of work in Python is not the most efficient.
- Concurrency. It is terrible for utilizing multiple cores.
- Raw performance. Crunching numbers in Python is not a good idea.
- Distributed computing. No aspect of Python makes it particularly suited to distributed computing. Few languages and runtimes are.
- Managing complexity. Python applications can get pretty messy and allow some really poor choices to be made.
Some of these are related to the language, most are related to the runtime and how it behaves. There are established solutions for many of these problems. You’ll see libraries such as numpy that do number-crunching and they drop into C/C++ to do this efficiently. There are also efforts to unlock the performance problems of the runtime in various ways.
Both concurrency and distribution are other areas where Python either drops into additional infrastructure tooling and/or C/C++ to achieve good results. Ray seems like a popular solution and it actually ends up pretty close to the Actor model offered by Erlang, along with distribution and serving concepts similar to what Nx offers.
What does Elixir offer?
Elixir covers similar ground to Python in general. It is a high-level language, it is dynamically typed, it is expressive and friendly. The runtimes are however, very different.
Due to the Erlang virtual machine (the BEAM) Elixir can offer fantastic concurrency and consistently low latency. Erlang was built for soft realtime, distributed systems, fault tolerance and scale.
Elixir inherits and expands on a fundamentally high level design of a Functional Programming style language with the Actor model built in. This limits and clarifies complexity. If you want or need complexity you generally need to more explicitly opt in to it. You can build some gnarly things. But if you write the code that gives the least resistance you’ll generally be writing pure functions operating on immutable data.
It is really bad at number crunching. The VM has recently gotten a JIT compiler which lets it be a fair bit better at certain raw number crunching tasks but overall it is not what it is for. This is where Nx, Axon and friends step in and provide a functional Elixir API for building up your computational graphs in a high-level language that will then be executed using XLA on either CPU or GPU as highly optimized code. Erlang has always offered NIFs and a few other escape hatches to do demanding work in native code. This is a purpose-built abstraction on top of that capability with ML as the focus and goal.
Your application will already be better at concurrency and perform more consistently at runtime under the BEAM VM. When you want to start distributing and orchestrating the workloads you should find that it is trivial compared to Python. You don’t necessarily need to bring in any particular tooling for it or add infrastructure like message queues. Elixir is already very good at orchestrating work.
Livebook. A tool for interactive, easily reproducible, real-time collaborative code notebooks. Jupyter Notebooks have all sorts of weird drawbacks and are still a great tool. Livebook is a much cleaner design, has features that are unlikely to happen in Jupyter and allows that nice and easy workflow for prototyping and exploring. You should try it out if you want to poke around with Elixir, with or without ML. Easiest way to try Elixir code.
The VM is built for runtime
Erlang is incredibly powerful in terms of observability. You can inspect your running system in depth and Elixir fully inherits this capability. You can work with the running system in a manner that is incredibly unusual if not entirely unique. Start and stop processes, find and identify bottlenecks and build your system for high levels of fault tolerance.
This will also translate to the ML work. Parts will be harder to introspect as they happen on GPUs or in native code but overall the way work flows through your application or cluster will be observable.
What is not there yet?
Tons of things are potentially missing depending on what you want to do. It is still early days but there is also a lot of activity.
There are many cool models already implemented and drop-in-ready for use in the Bumblebee project. There are already multiple backends for Nx providing options for edge inference (TFLite), alternate general backends (TorchX I believe), mature XLA support and more is coming. I also look forward to the OpenXLA efforts that seem to increasingly support Apple systems with Metal and all that. Those are in progress for Nx as well.
Importing models straight from Python has been a mixed bag from what I understand but I think recently announced Ortex broadens the possibilities significantly for ONNX-packaged models.
A lot vision models are usable through the OpenCV bindings project evision as well.
Python was not chosen for Machine Learning. It was an accident of academia that Python became the tool. Some was its merits as a prototyping tool, certainly. But as ML grew it mostly remained due to inertia. People built libraries to cover the deficiencies and so it grew. While I don’t mind Python I don’t think it is a particularly good tool for this job. It struggles with a number of performance problems in production environments. Every new need in ML necessitates new libraries because you cannot tackle it inside of regular Python.
Elixir has powerful abstractions for distributing work, performing well concurrently and scaling far and wide. Out of the box. And then the performant ML stuff is built as additional libraries that hook into these abstractions cleanly. The focus on immutable data structures and functional programming reduces complexity and avoids the code becoming a big ball of mud while the Actor model provides the tools to build architecturally sound and scalable systems without necessarily bringing in a ton of additional tools and infrastructure.
It is ridiculous how easily I can swing together a Phoenix web app and add some ML smarts to it. José showed some of this in his Bumblebee launch demo. I genuinely think Elixir provides strong advantages for companies that have a reason to do ML. It also puts basic ML within reach of the random web dev builder type who just wants a little special something for an app.
- User profile photo? Crop it to the detected face or remove the background and have a transparent cutout to play with. Or turn it into a sketch, a comic style portrait or whatever. U2net has many applications.
- Text being written? Use text sentiment analysis to get some understanding of what is being communicated in your app and adapt UI accordingly.
- Audio in use? Transcriptions via Whisper are quite straightforward to achieve.
Whether you agree or disagree you can reach me over email at firstname.lastname@example.org or you can poke me on Mastodon where I am @email@example.com.
Note: Or try the videos on the YouTube channel.