I am pleased to announce that as part of my work on knowledgeblog (n.d.a/) we now have two new tools — Greycite and kblog-metadata — and have extended kcite, our citation engine (n.d.b) I will just give a brief overview here of the functionality here. Subsequent articles will describe these tools in more detail, explaining the rationale behind them.

The kcite engine, which you can see in use in this article, produces a nicely formatted bibliography list, generated using only identifiers to these articles: DOIs, Pubmed IDs or arXiv IDs. One obvious absence from this list, however, is the ability to directly cite URLs. We have now started to address this, through our two new tools.

Unlike other identifiers, we lack a centralised resource capable of delivering bibliographic metadata about a URL. To enable this, my colleague, Lindsay Marshall (n.d.c) has developed Greycite (n.d.d/) which went live earlier this week. Greycite allows you to search for bibliographic metadata about a given resource. So, for instance, you can view the metadata for my article on realism (n.d.e/) Probably more usefully than this view, however, is that you can also retrieve this metadata computationally: currently, we support JSON suitable for citeproc-js (n.d.f) and bibtex (n.d.g/) Obviously, we can support further formats if we choose; fortunately, the metadata for a URL is, in general, very simple (date, title, website or “container” title).

Greycite must, however, get its metadata from somewhere. As we wanted greycite to be both an automated and authoratitive source, we have decided to take metadata only from the URL being referenced (or referenced from the URL). Anything else would have required an authentication step, to prove that metadata was being provided by the owner of the content. I will describe this in more detail later; we support COiNS (n.d.h/) OGP (n.d.i/) and Google Scholar Metatags (n.d.j) In practice, this combination of sources allows us to provide rich references to many URLs. Where not, we fallback gracefully.

Unfortunately, formal metadata on the web is not heavily controlled or pre-defined. If you are using WordPress to publish your articles, it is largely dependant on your theme as to whether there is any metadata on your articles. I have started to address this with kblog-metadata (n.d.k/) Again, I will describe the functionality in greater detail later, but essentially, this plugin adds metadata in all three of the formats mentioned above in the document headers, and provides a good deal of flexibility about where that metadata comes from.

Finally, I have extended kcite to query for metadata from greycite for each URL cited. The data coming back is used directly for rendering, so this should have reasonable performance; moreover all data is cached in the WordPress database, limiting outgoing network traffic from the webserver for each reference.

Work is not complete yet, and there is much more to do. However, I have been using development versions of these tools now for a month or so, and the experience is rather good. The metadata is useful during authoring, as it can be used to find the correct reference. While we cannot capture metadata from all sources, a surprisingly large number of them do work. And the development of greycite means that this metadata can be served efficiently and without adding too much complexity to kcite. In short, while it may not be a complete solution, these enhancements represent a substantial step toward making academic URLs formally citable, as others have recently called for (n.d.l/)

Addendum

2012-05-09: I have already published an initial article (n.d.m/) about kblog-metadata, which should have been referenced here.