I’ve noted in the past some of the strange beliefs about DOIs (http://www.russet.org.uk/blog/1849). One of these is that DOIs provide some magic archiving capability (http://www.russet.org.uk/blog/2360). The other is the strange one that “DOIs make things citable”. This was one of the selling points for Figshare, for instance.

I’m interested to see that now GitHub have now joined the party (http://github.com/blog/1840-improving-github-for-science), and again using the justification that “DOIs make things citable”. I am lost in attempting to understand this.

First, GitHub have stable URIs for repositories. It’s in their business interests to keep these and if they change them they will break every single repository that has checked things out using the URI.

Second, if I have a github URI I actually know that I have a link to a repository, and it is fairly clear that I can clone from this repository. With a DOIs I do not. Paper, datacite item, git repo, it is not possible to tell.

Third, with a github URI I have a URI that I compare against other URIs and work out whether it is the same or different. If I have a DOI, I now have two identifiers, the DOI and the URI both of which identify the same thing. Surely, this makes the situation worse, and not better.

Am I being a little cynical in wondering why some publishers require them? Do they, perhaps have a vested interest in making things more invouluted and not just using standard web technology (http://www.russet.org.uk/blog/2248)?

It seems to me like a clear case of DOIs are magical fairy dust. We sprinkle them on a github repository and now it is better, when actually we have made the situation worse.

The only justification that we have is “DOIs make it citable”. Is there a better one? Answers on a post-card please.


Update

I totally missed a post by Carl Boettiger which makes some of the same points (http://www.carlboettiger.info/2013/06/03/DOI-citable.html).

On the general issue of metadata, a DOI will give some harvestable metadata from the DOI, although Greycite can give much of the same metadata direct from GitHub (see for instance here). Having GitHub fix their metadata would seem to me to have been an easier win. And, of course, github URIs can be used to clone from and extract all the repository metadata using, well, git.

Bibliography

8 Comments

  1. Tom says:

    I imagine it’s because DOIs reliably resolve to consistent metadata as well as the content itself via content negotiation.

  2. Jakob Voß says:

    I fully agree but there is one possible benefit of using a DOI for a GitHub repository: its looks more scientific. Looking at a GitHub URL I know that there is a repository but it could be anything. With DOI chances are high that someone (even the author) thought that the repository is worth citing in some scientific context. I would not say this is true for all of my repositories. Sure a special ¨citable¨ or ¨science¨ tag at GitHub would be more useful than DOIs. By the way, a DOI based on a SHA1 sum would have been useful to ensure that the DOI always points to the same content – but maybe it is not the purpose of DOI to uniquely identify some conten.

  3. Mikel Egaña Aranguren says:

    I added a Zenodo DOI to one of my GitHub repos just for laughs and you must specify to which paper the DOI refers to, which I believe might be of value for some evaluators

  4. Phillip Lord says:

    @Tom It’s possible I suppose. Although, no one mentions this in any of the write up. More over, relatively few people are aware of the content negotiation of (some) DOIs. Finally, the same or better metadata is available from the github API and/or the git repo. The latter works with any repo and not just github.

  5. Phillip Lord says:

    ” I fully agree but there is one possible benefit of using a DOI for a GitHub repository: its looks more scientific.”

    I agree with you here, although I prefer my phrase of “magic pixie dust” as it is more accurate that “looks more scientific” I think.

    “With DOI chances are high that someone (even the author) thought that the repository is worth citing in some scientific context. I would not say this is true for all of my repositories. ”

    I would take the other option; if a URL is cited in a scientific context, then it should be worth of that citation. This is what reviewers are for.

    “By the way, a DOI based on a SHA1 sum would have been useful to ensure that the DOI always points to the same content – but maybe it is not the purpose of DOI to uniquely identify some conten.”

    DOIs are supposed to identify the same logical content. So the layout and some of the words of a paper may change, but it’s the same paper. You can argue the same for a git repo, although it is taking it to the extremes beacuse git repos change a lot. Having a good standard best practice of citing git repos as URL and SHA1 for commit would be a good and much better thing than a DOI. Of course, the commit could disappear, (having been rebased away), but you would always know that this had happened.

    It is a shame that, instead of getting sidetracked by DOIs, these issues were not discussed when discussing how to cite a git repo.

  6. Phillip Lord says:

    Confused Mikel — do you mean, you have to describe which paper the git repo is cited in? I agree that this might be of some value, although I’d rather get this data from the paper. Why same the same thing twice?

  7. Hugh Shanahan says:

    I could be wrong here but I don’t think you can get download statistics from github. Having that data is really what’s important.

  8. Phillip Lord says:

    Again, no mention of this is made in any of the documentation. Besides, if you cannot get download statistics from github, where do these stats come from? Presumably, it means downloads of the archive of the repo. So not download of releases. Not clones or pulls from the repo. And only downloads which come through the DOI rather than direct. It would seem the download statistics are statistics of those who have downloaded the most inconvenient and useless form of the repository.

Leave a Reply