## The Problem with DOIs

Rhodopsin is a protein found in the eye, which mediates low-light-level vision. It is one of the 7-transmembrane domain proteins and is found in many organisms including human.

Rhodopsin has an number of identifiers attached to it, which allow you to get additional data about the protein. For instance, the human version is identified by the string “OPSD_HUMAN” in uniprot. If you wish, you can go to http://www.uniprot.org/uniprot/OPSD_HUMAN and find additional information. Actually, this URI redirects to http://www.uniprot.org/uniprot/P08100.html. P08100 is an alternative (semantic-free) identifier for the same protein; P08100 is called the accession number and it is stable, as you can read in the user manual. If you don’t like the HTML presentation, you can always get the traditional structured text so beloved of bioinformatics; this is at http://www.uniprot.org/uniprot/P08100.txt. Or the Uniprot XML (that is at http://www.uniprot.org/uniprot/P08100.xml). Or http://www.uniprot.org/uniprot/P08100.rdf if you want RDF. If you just want the sequence, that is at http://www.uniprot.org/uniprot/P08100.fasta, or http://www.uniprot.org/uniprot/P08100.gff if you want the sequence features. You might be worried about changes over time, in which case you can see all at http://www.uniprot.org/uniprot/P08100?version=*. Or if you are worried about changes in the future, then http://www.uniprot.org/uniprot/P08100.rss?version=* is the place to be. Obviously, if you want to move outward from here to the DNA sequence, or a report about the protein family, or any of the domains, then all of that is linked from here. If you don’t want to code this for yourself, there are libraries in perl, python and java which will handle these forms of data for you.

So this might be overkill, but the point is surely clear enough. It’s very easy to get the data in a multiple variety of formats, through stable identifiers. The history is clear, and the future as clear as it can be. The technology is simple, straight-forward both for humans and computers to access. The world of the biologist is a good place to be.

What does this have to do with DOIs. Let’s consider a section of publications from one of us. Of course, one of the nice things about DOIs is that you can convert them into URIs. But what do they point to? Well, a variety of different things. Maybe the full HTML article. Or, perhaps an HTML abstract and a picture of the front page. Or more links. Or, bizarrely, a list of the author biographies. Or just another image of a print out of the front page of a identified digital object.

These are a selection from our conference and journal publications. Obviously, this doesn’t cover many of our conference papers, as most don’t have DOIs unless they are published by a big publisher. Or our books. These are published by big publishers, but obviously they are books which is different. I’ve also organised or been on the PC for a number of workshops. They don’t have DOIs either. All of them do have URIs.

In no case, can we guarantee that what we see today will be the same as what we get tomorrow, even though DOIs are supposedly persistent. The presentation of the HTML on those pages that display HTML is wildly different; in many cases, there is no standard metadata. Given the DOI, there doesn’t appear to be a standard way to get hold of the metadata. If you poke around really hard on the DOI website, you may get to http://www.doi.org/tools.html. At this point, you probably already know about http://dx.doi.org, which allows you to resolve a DOI through HTTP. The list of links doesn’t take that long to work through, so you might eventually get to http://www.crossref.org. From here, you can perform searches, including extracting metadata for articles; obviously, you need to register, and you need an API key for this. It doesn’t always work, so if that fails, you can try http://www.pubmed.org, which returns metadata for some DOIs that CrossRef doesn’t, but doesn’t hold a DOI for every publication it lists (even those that have them), so it also fails in unpredictable ways.

The difference between the two situations couldn’t really be clearer. Within biology, we have an open, accessible and usable system. With DOIs, we don’t. The DOI handbook spends an awful lot of time describing the advantages of DOIs for publishers; very little is spent on the advantages for the people generating and accessing the content. It is totally unclear to us what use case DOIs are trying to address from our point of view; what ever it is, they certainly seem to fail of their purpose.

So, why do we care about this? Well, recently, we have been implementing a DOIs for kblogs. Ontogenesis articles now all have DOIs. When we were originally thinking about kblogs, our investigations on how to mint new DOIs came to very little. If DOIs are hard to use, creating them is even worse, you need a Registration Authority; setting this up within a university would be a nightmare. Compare this to the £9 credit card transaction required for a domain name (even this can be quite hard in a University setting!). In the end, we have managed to achieve this using DataCite. Ironically, they are misusing technology intended for articles to represent data; we are misusing DataCite to represent articles again. We also have to keep a hard record of our own of the DOIs we have minted, because, despite the fact all this information is stored in the Datacite database, there is no way of discovering if a DOI points at a given URL using the Datacite API, so we have no way of doing a reverse lookup from a blogpost to discover its DOI.

We’ve also created a referencing system for WordPress. This does DOI lookups for the user, currently using CrossRef, or PubMed. We are not sure yet whether we can retrieve DataCite metadata in this way also.

The irony of this is that it is all totally pointless. WordPress already creates permalinks, based on a URI. These URIs are trackback/pingback capable so can be used bi-directionally. We have added support so that URIs maintain their own version history, so that you can see all previous versions. If you do not trust us, or if we go away, then URIs are archived and versioned by the UK Web archive. Currently, we are adding features for better metadata support, which will use a simple REST style API like Uniprot. Hopefully, multiple format and subsection access will follow also.

So, why are we using DOIs at all? For the same reason as DataCite which has as one of it’s aims “to increase acceptance of research data as legitimate, citable contributions to the scientific record”. We need DOIs for kblog because, although DOIs are pointless, they have become established, they are used for assigning credit, and they are used as a badge of worth. For us, we find it unfortunate, that in the process of using DOIs, we are supporting their credentials as a badge of worth, but it seems the course of least resistance.

Update (2012-03-09) Ironically, most of our Uniprot links were broken which somewhat undermines the strength of our argument! Uniprot IDs are guaranteed to be permanent, while their website is not.

1. #### Fuzzier Logic » Blog Archive » The Problem with DOIs says:

[...] article was jointly author by Phillip Lord and Simon [...]

2. #### Tweets that mention An Exercise in Irrelevance » Blog Archive » The Problem with DOIs -- Topsy.com says:

[...] This post was mentioned on Twitter by Phillip Lord, Nicolas Bertrand. Nicolas Bertrand said: RT @phillord: DOIs? What are they good for? http://is.gd/OVPJu5, http://is.gd/MqyRpn [...]

3. #### The Ontogenesis Knowledgeblog: Lightweight Semantic Publishing | Knowledge Blog says:

[...] The problem with DOIs, 2011. http://www.russet.org.uk/blog/2011/02/the-problem-with-dois/. [...]

4. #### Karl Ward says:

dx.doi.org supports content negotiated requests for CrossRef DOIs:

$curl -LH “Accept: application/rdf+xml” “http://dx.doi.org/10.1126/science.1157784″$ curl -LH “Accept: text/turtle” “http://dx.doi.org/10.1126/science.1157784″

These requests return representations of RDF graphs for the metadata of a DOI.

See the announcement here:

http://www.crossref.org/CrossTech/2011/04/content_negotiation_for_crossr.html

5. #### Karl Ward says:

In fact, dx.doi.org supports the same method of content negotiation for all DOIs. Each DOI registration agency must implement this functionality. CrossRef have done and it is hoped that other RAs will follow.

6. #### Phil Lord says:

Obviously this happened after we posted. I would say that a REST API would be nicer, because then I can cite a specific form of DOI. Still I agree that this is useful, and I might use it.
But this still begs the question, that if the way that you make DOIs useful is by turning them into URIs, why not use a URI?

Phil

7. #### Karl Ward says:

Phil,

I wanted to highlight the content negotiation work because I thought it might be useful to you.

Your question may be rhetorical but I must say that I don’t understand the distinction that is made between URIs / DOIs and URIs / some other identifier scheme (for example, the accession numbers you refer to in the blog post).

What does one do with a DOI if it is not presented in the form of a URI? The answer is manually convert it into a URI using specific knowledge of the dx.doi.org service. But the same is true of accession numbers and the uniprot service, or any other identifier scheme and it’s accompanying service(s).

I suppose if I’m making a point, it is that identifier schemes and the services that support them are not separable. Knowledge of both components is required to make them useful. In that sense, I don’t see how DOIs are different from any other identifier scheme, regardless how a user consumes them – as URIs or naked identifiers.

That said, it is surely better to present identifiers in a usable form, and CrossRef has now updated their DOI display guidelines to recommend the dx.doi.org form.

8. #### Adding Multiple Authors to a Post | The Knowledgeblog Process says:

[...] useful where the occasional post in a Kblog has multiple authors; for example, see this article (http://www.russet.org.uk/blog/1849) which is hosted on my Work kblog, but was authored by myself and Simon Cockell. This article was [...]

9. #### What is Greycite | The Knowledgeblog Process says:

[...] the DOI system has a number of significant issues (http://www.russet.org.uk/blog/1849), it does have one advantage; new DOIs are minted through a central authority, or rather one of [...]

10. #### Martin Eve says:

I found this post a little strange in that it doesn’t mention perhaps /the/ key aspect of DOI numbers: digital preservation. The point of a DOI is that, if the original goes down (say, you die, for example, or lose your domain rendering permalinks useless), an archival service (LOCKSS, CLOCKSS or Portico) can kick in and restore the content. The federated DOI resolver handle, however, will remain the same. DOIs are a central part of preserving scientific and scholarly content beyond the life of a person or organization.

11. #### Phillip Lord says:

suggests, we were bitten by the problem. However, I still think your argument
is incorrect; I will expand on this soon, as I have had a post in my head for
a while. The key points are this though. First, the DOI system does not
guarantee that this forwarding happens; it depends on the registration
authority. Secondly, they have a severe usability flaw, which means that
everyones life is being made harder for the broken URIs. And, finally, we can
still achieve something similar, without two step resolution.

12. #### An Exercise in Irrelevance » Blog Archive » Why academic publishing is like a coffee shop says:

[...] argued about these before (http://www.russet.org.uk/blog/1849), and no doubt will do [...]

13. #### An Exercise in Irrelevance » Blog Archive » Publishing With Future Internet says:

[...] I have never published with MDPI before, but I have recently reviewed a paper for them; I am very selective with reviewing these days, but the paper sounded interesting and I had never heard of Future Internet before. So, this seemed like a reasonable bet. Accordingly the paper has just been published and come complete with (http://dx.doi.org/10.3390/fi4041004); a dubious badge of honour if ever there was one (http://www.russet.org.uk/blog/1849). [...]

14. #### An Exercise in Irrelevance » Blog Archive » Archiving of Scientific Material says:

[...] and claiming the added value of DOIs, something I find dubious (http://www.russet.org.uk/blog/1849). Again, though, the same problem; Figshare archives the source. The most extreme example of this [...]