Seeds grow in the underground

Is this evil?

2020-09-15

I try to be a friendly citizen of the web. I try to keep my site in good shape. It should load quickly, track nothing beyond your basic server logs, not hassle you about cookies, GDPR or my newsletter. I do have a newsletter but it won't pop up in your face here. I try to stay firmly on the side of friendly and a good experience in how I run this site.

I do these things because I want my site to be not just bearable but an actually great experience in its own quiet way. Just as important to me as replying to email in a reasonable timeframe, doing what I've told people I will and following up on things that require my attention. I try my best and I want this to also be true for my website. As a consequence it also happens to have a perfect Lighthouse score most of the time.

A while back I had an idea that smells a bit funny. It has the scent of some poor web ethics. But it may also be a very useful. I'm not the first to discover it but I also haven't seen a lot about it. It seems to be usable for reasonable things but it is also somewhat abusable. Also, it definitely breaks the principle of least astonishment (or least surprise) in that it makes your browser do things you do not expect.

I'm talking about tracking users via CSS. So one of the potential downsides of only knowing your traffic via server logs is that they are not a very good indicator of readership as outlined by the Plausible Analytics people. Lots of automated traffic on the web, bots, crawlers and scrapers. So if there is a way that can remove most of the automated traffic without loading any JS, is that a win?

Consider this CSS:

    body:hover {
        background-image: url("https://underjord.io/you-was-tracked.png");
    }

There are endless ways to do it. This has a certain elegance because it actually requires mouse interaction. You could probably cover accessibility with some use of :focus as well. CSS won't load the URL until the selector is hit and it will only load it once for a given page view. You'd want a separate URL per page, which is trivial enough, query params would likely be enough. I'm convinced that this would give me better data.

I am not going to be implementing it. I think it's mostly fine to use things like Plausible or Fathom that are privacy-oriented analytics platforms because it is a lot better than what was going on before and I think the exchange between visitor and site isn't entirely clearcut. The negotiation is implicit and the browsers set the outer boundary terms. But what we do within that space is what determines our character. And I think the best way to serve my audience is to be mindful of this stuff, not stare myself blind at the numbers and keep working to maintain the trust and attention I've built up.

The ugly side is that there are things that you can do to indicate every single link your users hover, even if they are blocking all JS. You can absolutely use this to do things I consider somewhat of an overstep that are probably just table stakes in the analytics space. One could also measure how long people stay using invisible animations, doing the same thing I believe.

But it is a neat hack and I hadn't really thought about the possibility before. It is already somewhat known. I haven't checked who does and doesn't do this. So I figured I'd share. Because I do like a neat hack.

If you want to register your opinion in the aggregate you can just click one of the following links and you and all the bots will be recorded in the server logs:

Where do you draw your lines on privacy? Are there things sites are fine to do to understand their traffic and visitors or should even the server log be abolished?

I'm genuinely curious how you feel. Feel free to get in touch via lars@underjord.io or just find me on Twitter as @lawik.