Web Annotation: A Short Dive

July 19, 2019. Send your thoughts via twitter or mail

Edit, August 2020: I now, a year after writing this, think that general web annotation is not how communal sense making is going to evolve. In contrast to general annotation, domain-specific corpus and dataset annotation is a different story and evolving rapidly. My prediction of an annotation service that has a few million users within the next three years (now two) will probably still come true, but I changed my prior of hope for the area. My guess why this happens is that truth and meaning are less and less found inside one document, so why invest in massive annotation efforts if it's better to search outside the document for more context? Also, many jobs of annotation, like claim verification, flagging and connecting similar ideas, which I have started to work on, can and will be substituted by ever higher levels of natural language processing (NLP).

Last winter, I got fed up with how I do Internet, so I prototyped a browser extension that lets you create and view annotations within your social circle on any website. After sharing the alpha version with a friend, he mentioned Hypothesis – a team that has been working on the same problem for years and already made strong headway towards open source web annotation.

At first I was frustrated.
I spent many days with the torture of modern web development¹, when many of the problems were already solved. It's a niche problem apparently, otherwise I'd have discovered a solution before feeling pressured to build one.

At the moment, annotation is mostly used by institutional knowledge workers (education, publishing,…). In fact, none of my friends in tech or science use annotation for work and nobody does so privately. This seems strange given that you can surf the web as usual, AND:

see other's insights, revisions, reviews, critiques, perspectives when you look at some content
add value with highlights, reactions, comments without the site having to build interfaces for it
discover shared interests together with friends or whatever group
build a richly interlinked knowledge and social graph while at it
research collaboratively on steroids (keyword: structured annotation markup)
….

True, you could get something similar by stacking a few tools (maybe Twitter, Delicious, Evernote…), switching between apps, keeping stock of context and references and so on…but having those interfaces anchored to the actual segment of interest instead, optionally with structured markup, changes how we interact, reason and evolve with it. Example: Fighting Fake News with typed annotations.

At least that’s what I thought when I started working on it. My question then became:

If it’s so awesome, why isn't it more common?

A shallow dive into the semantic web rabbit hole, promptly surfaced a graveyard of failed attempts at web annotation. I'll pick two as example: Google SideWiki and (Rap)Genius.

SideWiki got cancelled in 2011. I couldn’t find out why exactly². Google seemed to have all ingredients to win big.
(Rap)Genius, after getting funded by a16z six years back, wanted to annotate more than just song lyrics, but didn’t. They still pay lip service to the idea and have portfolio cases where it was used, but the project seems to be a silent failure (can’t even download a browser extension at this point). I guess many users are cautious to silo their content inside another VC-funded media company.

There are many more, some still active. Amongst them Hypothesis surfaces as most reliable. Its open source funding structure makes it less prone to surveillance capitalism, sociopath executives, the Chinese military or other yet-to-discover corporate BS. You own your data, full stop.

After I overcame the bias I had for my own project, I’ve grown to like Hypothesis and embedded it site-wide here. The UI and social features are underdeveloped, but that’s fixable. I hope they succeed also beyond institutions.

Stanford's CNN course: Active discussions among online students in the margins

Schools are increasingly adopting annotation. One reason: LMS and interoperability have improved.

It wouldn’t make sense to get VC funding and develop a competitor that then has to silo data for competitive advantage and network effects. I’ve had a taste of that and it’s not healthy for science, society or the web. I’d rather build a new generation of apps on top of this emerging, open ecosystem.

The optimist in me predicts an uprise of those meta-applications in the next three years, with one of them having a few million users. The pessimist still wants a compelling reason why annotation did not take off with knowledge workers so far...the tech was there over a decade ago.

Dreams about a semantic web with intelligent agents, rich context, provenance tracing, lateral search […] have been around for 30 years, but are persistently not happening. Currently, we still copy information mostly by value, not reference.

Referencing entire websites is not granular enough to reuse, test, verify and improve on an idea along a single chain of work and revision. Most entities don’t have a wikipedia page.

I feel we’re at a point in this giant internet project where many among us, especially developers, are throwing up (their hands in anger) and shout: “Refactor!".

Getting annotation right is a fundamental piece for better knowledge exchange. To review, iterate, comment, tag, contextualize, version and search our ideas exactly where they live is how we build tested knowledge that gets richer and more meaningful over time. Fingers crossed🤞.

1. It’s a giant garbage fire of throw-away leaky abstractions and overbloaded, breaking dependencies. Satan recently switched torture providers and new arrivals to hell have to debug CD/CI pipelines of B2B apps.

2. there was some backlash from site owners who did not want their sites smeared by “grafitti” from trolls, but that’s usually not a show stopper