Ecto & Multi-tenancy - Dynamic Repos - Part 1 - Getting started

2019-10-14

Underjord is an artisanal consultancy doing consulting in Elixir, Nerves with an accidental speciality in marketing and outreach. If you like the writing you should really try the pro version.

Ecto is the database library we know and love from the Elixir ecosystem. It is used by default in Phoenix, the high-profile web framework. Ecto has a bunch of cool features and ideas. But this post is about a corner full of nuts, bolts and very little of the shiny or hot stuff. It just covers some rather specific needs. Ecto docs for these features are this guide and this API. But that is usually not the whole picture. I'll try to cover some of the practicalities.

Fair warning: You probably do not need dynamic repos. Investigate prefixes first to try and keep things simple. Implementing prefixes is significantly simpler. This feature is high power but has the potential to bring a lot of complexity.

A brief outline:

What are Dynamic Repos?
Starting a repo
Activating a repo for use
Creating a repo
Running migrations

What are Dynamic Repos?

They provide one approach for multi-tenancy for Ecto. Multi-tenancy in the sense of being able to use multiple copies of a single database definition (a repo) in the same application.

Or more technically, they are independent instances of your repo-module that you can start with a varied set of runtime configurations.

Why though?

In my use-case, I'd like multiple customers to be able to use my app. I want to avoid storing all their data in the same database with just a customer_id to tell them apart. This has advantages such as when the customer invokes the GDPR and asks for all their data. I just dump one database, I dump one directory of object storage with their media and I export one line of customer records. Bam! GDPR export complete.

Some other benefits are simpler scoping for backups and restores. And a simpler design for access control. The main detriment is that it does add significant complexity. Carefully consider your use-case.

If you simply need multiple databases but you know which ones they will be at compile time you do not need this. If you need to connect to entirely separate databases identified or created at runtime, this could be your jam.

Starting a repo

There really isn't much to starting an Ecto repo for use as a dynamic repo. You can either name it something you like if you want explicit naming. Or for my needs just give the name nil and it will be anonymous, only identified by its pid.

The code to do this, assuming you have MyApp.Repo in your application is just:

elixir

# Get the normal config from your config files, but set a name key to nil
our_repo_config = 
	Application.get_env(:my_app, MyApp.Repo)
	Map.put(:name, nil)

{:ok, repo_pid} = MyApp.Repo.start_link(our_repo_config)

Activating a repo for use

So now you want to be able to actually use this repo for queries. Generally Ecto will expect a default repo to be started as MyApp.Repo, that is, the module name for the repo. So now we actually need to visit the specific API for dynamic repos which is Ecto.Repo.get_dynamic_repo\0 and Ecto.Repo.put_dynamic_repo\1.

So we can do this:

elixir

{:ok, repo_pid} = MyApp.Repo.start_link(our_repo_config)

MyApp.Repo.put_dynamic_repo(repo_pid)

The docs state, regarding Ecto.Repo.put_dynamic_repo\2 that "from this moment on, all future queries done by the current process will run on[your dynamic repo]". I haven't dug into the details of scope here and if sub-processes will absolutely lose track of their repo. It is stored in the process dictionary (more about that here). So I imagine subprocesses do not share that. Something to be careful with and aware of.

There is also the sibling of this function which of course gets the current dynamic repo set. Which defaults to the repo default, so MyApp.Repo in our case.

Creating a repo

We've covered all of the functions provided by the dynamic repo API already, it is small and sweet. But it doesn't solve my use-case on its own.

When starting my app I want to ensure that existing customers have databases in our data store (Postgres in my case). Starting these fancy repos with their connection pools won't help a bit if I don't have a database created on the database server. In fact, there will be errors.

So we need to be able to do this dynamically too, because we want to do it at runtime. Someone registers as a customer. What do we do? Do we bring down our app, write some configuration dynamically, run a quick compilation and start it back up? Seems ... inconvenient.

So why not just use Mix? Mix isn't necessarily available on your runtime environment. If you have a two-stage Docker build for example you should end up with a production release without mix, because you don't need it. And you don't need it for this either. I heard shelling out is selling out. Definitely too catchy to be good advice but in this case it seems apt. I don't want mix to be required for running the application and I don't want shell commands for something that should be Ecto's job.

Check Ecto.Adapter.Storage and specifically the callbacks. storage_down\1 and storage_up\1. I thought this was using nasty internals, but I was reassured when I brought it up that this is the intended approach. Double underscore make me think "hidden" and "internal use only" from life in Python. It looks like this:

elixir

MyApp.Repo.__adapter__.storage_up(our_repo_config)

And if the config is legit this should create your database. Nothing more to it. You can use storage_down to clean up and remove your database, use it with great care because hell if that isn't a dangerous little function. It will drop your database. If you want to close the connection pool there are other options. This one will remove your database.

Running migrations

Creating a database is probably only about 10% of the story of managing your DB from code. We all have our migrations. They need to be run or this is entirely pointless. Thankfully, we can. And we won't be using mix here either.

The Ecto.Migratormodule takes care of us. There are some nuances. I'm not 100% sure if I needed to run put_dynamic_repo before Ecto.Migrator.run\3 but it didn't hurt. I had some challenges with that function missing the dynamic_repo option. Turns out it supports it fine, just a documentation issue. My PR for that has been merged, so it should be fixed in a future release.

elixir

Ecto.Migrator.run(MyApp.Repo, :up, dynamic_repo: repo_pid)

That should do it.

Note: If you run this for multiple repos you will experience warnings because it keeps loading the migration modules over and over again. I investigated potential PRs to fix this. That rabbit hole ended at "no, we shouldn't patch the elixir code server to improve this corner case" which seems fair. Jose gave me a very reasonable option which is to just bring the migrations out of priv and into my application like the modules they are. More on that in a later write-up.

So to recap, we can:

Create an anonymous instance of a repo with a different config (such as database name)
We can create the database at runtime
We can run migrations at runtime

But can we test it? Sure we can. It was a bit of a pain and I'm sure there is a lot of space to optimize. But I got myself some green dots and now know that it largely works. I'll attempt to cover that in a future post. Because there are a lot of parts to all of this.

So I've a few things I want to cover still but I wanted to get this out there because I honestly had to do a lot of digging, trial & error to get all the parts of this working.

And in the end it seems like I will be using prefixes because that puts me much closer to the batteries-included happy-path of Phoenix and Ecto and keeps me from learning too much about managing pools of connection pools.