Web Annotation: A Short Dive

Send your thoughts via twitter or mail

Edit, August 2020: I now, a year after writing this, think that general web annotation is not how communal sense making is going to evolve. In contrast to general annotation, domain-specific corpus and dataset annotation is a different story and evolving rapidly. My prediction of an annotation service that has a few million users within the next three years (now two) will probably still come true, but I changed my prior of hope for the area. My guess why this happens is that truth and meaning are less and less found inside one document, so why invest in massive annotation efforts if it's better to search outside the document for more context? Also, many jobs of annotation, like claim verification, flagging and connecting similar ideas, which I have started to work on, can and will be substituted by ever higher levels of natural language processing (NLP).

Last winter, I got fed up with how I do Internet, so I prototyped a browser extension that lets you create and view annotations within your social circle on any website. After sharing the alpha version with a friend, he mentioned Hypothesis – a team that has been working on the same problem for years and already made strong headway towards open source web annotation.

At first I was frustrated.
I spent many days with the torture of modern web development1, when many of the problems were already solved. It's a niche problem apparently, otherwise I'd have discovered a solution before feeling pressured to build one.

At the moment, annotation is mostly used by institutional knowledge workers (education, publishing,…). In fact, none of my friends in tech or science use annotation for work and nobody does so privately. This seems strange given that you can surf the web as usual, AND:

True, you could get something similar by stacking a few tools (maybe Twitter, Delicious, Evernote…), switching between apps, keeping stock of context and references and so on…but having those interfaces anchored to the actual segment of interest instead, optionally with structured markup, changes how we interact, reason and evolve with it. Example: Fighting Fake News with typed annotations.

At least that’s what I thought when I started working on it. My question then became:

If it’s so awesome, why isn't it more common?

A shallow dive into the semantic web rabbit hole, promptly surfaced a graveyard of failed attempts at web annotation. I'll pick two as example: Google SideWiki and (Rap)Genius.

There are many more, some still active. Amongst them Hypothesis surfaces as most reliable. Its open source funding structure makes it less prone to surveillance capitalism, sociopath executives, the Chinese military or other yet-to-discover corporate BS. You own your data, full stop.

After I overcame the bias I had for my own project, I’ve grown to like Hypothesis and embedded it site-wide here. The UI and social features are underdeveloped, but that’s fixable. I hope they succeed also beyond institutions.

Stanford's CNN course: Active discussions among online students in the margins
Schools are increasingly adopting annotation. One reason: LMS and interoperability have improved. 

It wouldn’t make sense to get VC funding and develop a competitor that then has to silo data for competitive advantage and network effects. I’ve had a taste of that and it’s not healthy for science, society or the web. I’d rather build a new generation of apps on top of this emerging, open ecosystem.

The optimist in me predicts an uprise of those meta-applications in the next three years, with one of them having a few million users. The pessimist still wants a compelling reason why annotation did not take off with knowledge workers so far...the tech was there over a decade ago.

Dreams about a semantic web with intelligent agents, rich context, provenance tracing, lateral search […] have been around for 30 years, but are persistently not happening. Currently, we still copy information mostly by value, not reference.

Referencing entire websites is not granular enough to reuse, test, verify and improve on an idea along a single chain of work and revision. Most entities don’t have a wikipedia page.

I feel we’re at a point in this giant internet project where many among us, especially developers, are throwing up (their hands in anger) and shout: “Refactor!".

Getting annotation right is a fundamental piece for better knowledge exchange. To review, iterate, comment, tag, contextualize, version and search our ideas exactly where they live is how we build tested knowledge that gets richer and more meaningful over time. Fingers crossed🤞.


1. It’s a giant garbage fire of throw-away leaky abstractions and overbloaded, breaking dependencies. Satan recently switched torture providers and new arrivals to hell have to debug CD/CI pipelines of B2B apps.

2. there was some backlash from site owners who did not want their sites smeared by “grafitti” from trolls, but that’s usually not a show stopper