What ID3v2 could have been

Underjord is a tiny, wholesome team doing Elixir consulting and contract work. If you like the writing you should really try the code. See our services for more information.

Speculations and specifications. If you were a Winamp user back in the day, or curate an MP3 collection currently, you might recognize the humble ID3 tag. It is what the metadata in the MP3 file is made up of. First it was pretty limited in the version later dubbed ID3v1. Like any good 2.0 they added a ton more fields, features, removed character limits and it was suddenly ID3v2. The latest spec is ID3v2.4 while the most commonly adopted one seems to be ID3v2.3. I recently found myself having a reason to dig into this specification. If you want more background on it you can find it at id3.org (well, it used to be at id3.org/Introduction but id3.org has been down for a while).

ID3 is my favorite kind of standard. A de facto one. MPEG, specifically our beloved MP3 didn’t have a way to embed metadata. So one was invented. Tacked on at the end of the file. With ID3v2 it goes at the start of the file. It’s a binary format. If you are used to JSON or XML for your data storage this is not so human-readable. It is pretty straight-forward as binary formats go I think. I haven’t really parsed one before. The spec documents are mostly clear and understandable (and used to be linkable at id3.org/id3v2.3.0) but not at all mobile-friendly (I suggest browsing them via mutagen instead, they are better).

This post is not about the technical intricacies of the format. There are things I could write about how it avoids conflict with MPEG synchronisation thingies. I could cover how easy it is to implement parsing in Elixir due to the powerful binary pattern matching syntax and how smooth encoding new binaries can be with IO data. That’s not what we’re doing. This is about a simpler time, where people saw the wild possibilities of music on computers and when people cared about files, damnit. This is about some of the most interesting and entertaining things I’ve run across while reading the spec.

Note: I’m not a scholar of retro computing or someone who does a lot of research typically. This will not be heavy on the footnotes or the digging, this is about vibes man (to totally butcher together two pieces slang, showing my range here).

Parts of an ID3 tag are called frames. Beyond Album titles, Song titles, Artist names and Genre the ID3 tag offers many more interesting ones. I have no clear idea which of these have seen production use but I think they paint a mighty picture of a very different future that could have been .. question mark?

There are generally useful things such as links, embedded images and embedded anythings really. In more recent times there have been additional specifications for Chapters and Table of Contents which are used to provide podcasts episodes with their chapter information and chapter art. Fabulous stuff. Genuinely useful in my day-to-day and if you keep an eye on what I do the purpose of why I dug into ID3 might lead us back to this usage. The newsletter is there after the post if you are curious for any additional notes on this.

The thing that first got to me was what really made me see the MP3 + ID3 file in a different light. Play counter (PCNT). This mighty little frame contains a number and it is intended to be the number of times the file has been played. According to spec it should be incremented when it begins playing. This means that the file changes as people “consume” the media in it. In that way it gives a single song a memory, a very low-resolution story to tell. The only files I expect to change regularly these days are typically considered documents, text files or production projects (image/video/audio production). Most files I deal with are otherwise copied and converted around now.

There is something that feels absurd about your MP3 player changing your song file. On the other hand I find it much worse that fewer and fewer people stick an MP3 file in a player of any kind these days. I bet most players didn’t implement this but imagine if they did. The weird cultural touches that could flourish. Personally I’ll never consider listening to any song with less than a 100 play count. I only want files that have clearly achieved popularity. Or finding a legit single-digit playcount copy of a popular rip. That means you got it close to the source!

I’m aware that I’m being silly but it tickles my mind to imagine that world.

The Play counter is the smaller and simpler sibling of a much wilder frame. Popularimeter (POPM). It allows storing an arbitrary number of (up to the max frame size of 16 Mb I believe) email and rating pairs. An email string, a rating from 0-255 and a personal play counter. The use of this rating was discussed on the ID3 Wikipedia page as apparently some OS:es and players use it to display a star rating for a song and someone has opinions. I think this frame seems fantastic!

Note: If you’ve heard the episode of the Regular Programming podcast where I get into this (not released at time of writing) I had forgotten about the personal playcount on the POPM frame.

Neat enough, I can rate my song. Just note. Due to the email address the file can have a lot of these ratings. If you keep passing along the same file you can build up a massive set of ratings. I can see who played it and how much. That could even be delightfully embarassing. Especially for me. I get stuck on songs.

I wonder if this was imagined to be used so that when you got a sample of a song in MP3 format it could come pre-loaded with ratings from a site or something. Or was it truly just the evolving file as a vehicle for crowd-sourced wisdom. Anyway, shove ratings and Personally Identifiable Information in more files and unleash them to the world I say. There’s something glorious about it.

Now the thought that ratings might come pre-loaded from some site, baked into a sample file is not wrought from thin air and mind stuff on my part. We have another odd one that gets to business. The Commercial Frame (COMR). “This frame enables several competing offers in the same tag by bundling all needed information.”

Every offer has the following. Prices specificed in any number of currencies, a date for how long the price is valid, a contact URL for reaching the seller, a “Received as” field indicating what the product is delivered as which has a number of options such as “Standard CD with other songs”, “File over the Internet” or “Stream over the Internet”. There are a bunch more musically inclined ones but also merchandise and a useful Other. It can also provide the seller’s name and an embedded logo.

Store front as a standard. In a file. I find it rather inspired compared to everything being a SaaS.

Then we get the Ownership frame (OWNE) which I assume makes the file officially an NFT. And the Terms of Use (USER) frame which presumably makes it into a smart contract. Or maybe not. The Ownership frame provides information about purchase price, date of purchase and the seller. Nothing about the actual owner. The Terms of Use is what you’d expect, the terms under which the file may be used. There is also a way of grouping frames and signing them cryptographically though parts of the implementation is left to the imagination.

Some honorable mentions as genuinely useful frames are Synchronized and Unsynchronized lyrics. I believe there may be a separate one for captions in an Accessibility Extension but I’m not sure.

The Event codes frame seems like fun. The event codes can be used for a number of different things, controlling lights, setting of explosives, whatever the player wants to interpret them as. There are some specific ones like start of song, bridge, end of song and a bunch of other musically related ones but you can specify a number of custom ones as well.

Since SQLite is trendy right now there is also no reason you couldn’t shove a SQLite database file into the file embedding part. 16Mb for a frame, 256Mb is the limit for the entire tag from what I gather. You an shove a lot of stuff in this thing.

I really wonder how much of this was ever used. It mostly seems sanely put together. Just hopelessly irrelevant to how things work now. It makes me really want to do some silly stuff with some MP3 files. Of course you don’t actually need an MP3 file. ID3 can work in a multitude of places. It doesn’t require media at all.

The possibilities feel endless to me. Completely unapplicable in the present day but I also desperately want to find out about current day usages. I’ve just dipped my toe in ID3, if you know the history and the present day of it, reach out. I’d love to know more and if you’ve written on it, share you stuff.

Note that this format is also quite dense. No curly brackets here unless you put them in your text field. It is impractical at times to deal with a binary format but it does make me happy that the tag header provides more info in 10 bytes than what it would take me to idiomatically express the version of the tag in JSON.

{"version": 3}  (15 bytes)

vs.

ID3 3 0 0 0001  (10 bytes)
|   | | | |____ 4 bytes of tag size (here set to 1 byte)
|   | | |
|   | | |______ 8 bit flags, here zeroed out
|   | |
|   | |________ spec revision: 0
|   |
|   |__________ spec version: 3
|
|______________ starting indicator "ID3"

This stuff has been fun to work with. I expect to share more about it as my implementation evolves and becomes useful. For now I needed to share the human aspect of the specification as I’ve enjoyed getting to know it and also share the speculations I concocted as I ran into these frames in my work.

If you have questions, comments or insight to share, do reach out at lars@underjord.io or find me on Twitter as @lawik.

Underjord is a 4 people team doing Elixir consulting and contract work. If you like the writing you should really try the code. See our services for more information.

Note: Or try the videos on the YouTube channel.