Simply Jonathan

In defence of the (documented) API

Way back in November (I realise this puts my comment in the category of insanely untimely responses, but so be it) Ruben Verborgh wrote an article called The lie of the API, which I got to via Jeremy Keith.

Let me, before I lay out my disagreement, start by saying that I agree with the basic gist of both their arguments: There is little reason why an HTML representation of a piece of content is freely available, but a JSON (or XML or YAML) representation requires an OAuthenticated API.

Although Keith disagrees that content negotiation is the way forward, his waryness seems to be of a nature that could be URL hacked away — instead of using actual content negotiation (Accept: application/json), one could simply append .json to the end of the URL, or other similar measure. It may not be the absolute cleanest, most native HTTP implementation, but I’m sure Verborgh and Keith could both live with this.

This doesn’t pave over the reliability of the API though, which I’ll argue is an, albeit maybe not explicitly communicated, major reason why sites choose to offer their data via an actual API.

The thing is, when you visit an HTML page, you do not need to know anything about the structure of the data. The browser and the developer have an agreement that HTML structured in a certain way, with CSS and JavaScript structured in their certain ways, will be displayed in a certain way (save of course for browser-inconsistencies, but that’s the general idea). That is why an HTML page can be viewed, and made sense of, by a human being, whereas a JSON representation for the far majority of people will make essentially no sense.

When you make a JSON representation of some data, the structure will not be self-explanatory, and you will be forced to choose a structure for the data. Unless you’re dealing with Platonic ideals (and are very good at achieving those ideals on the first try) this structure is bound to change. You might add a field, remove a field, or change the semantics of a field.

If you do this with an HTML document, you can go about the change any way you like – so long as the new structure of the data still conforms to the browser’s expectation of how an HTML page should be structured, you’ll still have something usable. If you do this with JSON – assuming the JSON representation is only read by a machine, which will almost always be the case – the consumer will quite possibly break, unless you’ve changed the consumer accordingly, or notified them in advance if they’re an external entity. This last case is where the (versioned) API comes into play: If one can make changes that don’t break existing implementations, by somehow working a versioning scheme into the API, that’s a bonus. HTTP Accept will not let you do this.

For representations that are only intended to be used internally, a changing, non-versioned JSON one may suffice; if one has control of the entire stack, one doesn’t need to maintain backwards compatibility to the same degree. But those sorts of APIs wouldn’t be subject to OAuth restrictions anyway.

I agree that an OAuth token shouldn’t be necessary to get a JSON representation of one’s Twitter stream, when an HTML representation is freely available – but I do think that the nature of intended-for-machines representations are so substantially different from intended-for-humans representations that some sort of agreement (and documentation) is required. If you can find an existing format that fits (Accept: application/atom+xml for a Twitter stream), by all means use it, but that’s also locking yourself into a model that may not fit your data exactly as you’d like it to – and unlike HTML, you have no way of telling the consumer what to do with your seemingly arbitrarily structured data, the way you do with CSS.

Insofar as it’s possible, you should make representations of your data that fit the user agent’s Accept header; but if you don’t commit to the structure you choose, it will be unreliable and essentially unusable for machines parsing it.

This is Simply Jonathan, a blog written by Jonathan Holst. It's mostly about technical topics (and mainly the Web at that), but an occasional post on clothing, sports, and general personal life topics can be found.

Jonathan Holst is a programmer, language enthusiast, sports fan, and appreciator of good design, living in Copenhagen, Denmark, Europe. He is also someone pretentious enough to call himself the 'author' of a blog. And talk about himself in the third person.