Kblog Metadata

Previously, I described the additions that we have made to the kcite plugin [1], which now supports multiple different types of identifiers. This includes the subset of DOIs [2] that come from either CrossRef [3] or DataCite [4], arXiv [5] or Pubmed [6]. However, rather embarrasingly, one of the identifiers that we do not support well are URLs. Slightly ironic as one of the purposes behind [7] is to demonstrate that it is possible to replicate the publication experience using the web.

The main reason for this is the lack of an active source of metadata. The various identifiers that we have supported all come with a standardised source of metadata, which is not so straightforward with a generic URL. This is one of the reasons for my new plugin, kblog-metadata (http://wordpress.org/extend/plugins/kblog-metadata/). This currently consists of three pieces of functionality: kblog-headers, kblog-authors and kblog-table-of-contents.

For a long time now, I have added COinS metadata [8] to both this blog and kblog [9]. But, from my perspective, COinS is a dreadful specification. It involves embedding a NISO 1.0 Context Object [10] into a span tag. The reference here is from the COinS specification [8], but is, unfortunately 404 at the time of writing. It uses a URL encoded query string — in short, a microsyntax inside HTML which needs it’s own independent parsing. Key strings are confusing at best (rft_val_fmt and rft.auinit for example — why both underscores and dots?). And there is a degree of randomness about things: first authors can be split into first name, last name, initials, while subsequent authors cannot. More over, I could not find a processor to test whether my COinS implementation was actually correct. I wanted something that was a bit easier, and also in wider use. So, while we still use COinS metadata, we have now also added meta tags as recommended by Google Scholar [11]; ironically, on a page with, as far as I can see, no meta tags at all. Finally, we also have Open Graph Protocol [12]. Fortunately their website does use their own advice. Kblog-headers includes all of these formats, as can now be seen on this page.

Since the inception of Kblog, one of the difficulties we have had is with multiple authors. When adding metadata, for instance, we need to ensure that all the authors are represented. We have used plugins such as co-authors-plus [13] to enable multi-author work. However, these plugins come with a lot of extra baggage, namely the requirement for all authors to have a WordPress login (either WordPress.com, or on the local installation). Essentially, aside from the first workshop [14], we have never seen anyone collaboratively edit documents on WordPress. Where multiple authors have worked together (which we have seen a lot) they have done so using Word, LaTeX, Google docs or asciidoc, collaborating with DropBox or email. Only the communicating author needs an account. The problem was accentuated with sites like Bio-Ontologies, where all of the articles were posted by either myself or Simon Cockell [15], but were authored by neither. From my perspective, we need the ability to separate these two roles — posting and authoring. Kblog-authors achieves precisely this. New authors can be added either using short codes within the document content, or through the WordPress edit page (the GUI is a little primitive, but functional). These authors do not need WordPress accounts, with the posting account being used if no authors are explicitly given.

Finally, I have rewritten kblog-table-of-contents, and am combining it with kblog-metadata. This provides a new shortcode [kblogtoc] which can be used to embed a table of contents showing all posts — ideal for searching over. For more computational use it is also possible to get a line separate text file (http://www.russet.org.uk/blog/?kblog-toc=txt) or approximately the same thing as HTML (http://www.russet.org.uk/blog/?kblog-toc=html), which can be cut and paste without having to view source. A more readable and searchable form can be seen embedded in a normal WordPress page (http://www.russet.org.uk/blog/table-of-contents/). This has also enabled us to finally fix the Bio-Ontologies contents page (http://bio-ontologies.knowledgeblog.org/table-of-contents) which now shows the correct authors, with all of the posts advertising their authorship.

All three of these plugins require further work. At the moment, they provide better metadata, but they do not give the author and reader enough utility to encourage people to install them, of which more in the future. Despite this, however, I think they are already proving useful, and should help to solve a long standing problem that we have had within WordPress for an academic environment.

Erratum

2012-05-09: Corrected typographical error which meant the kblogtoc shortcode was displaying incorrectly.

References

  1. P. Lord, "KCite Spreads its Wings", An Exercise in Irrelevance, 2012. http://www.russet.org.uk/blog/2012/02/kcite-spreads-its-wings/
  2. P. Lord, and S. Cockell, "The Problem with DOIs", An Exercise in Irrelevance, 2011. http://www.russet.org.uk/blog/2011/02/the-problem-with-dois/
  3. "crossref.org", 2016. http://www.crossref.org/
  4. . DataCite Team, "Welcome to DataCite", 2017. http://www.datacite.org/
  5. "arXiv.org e-Print archive", arXiv.orghttp://arxiv.org/
  6. . pubmeddev, "Home - PubMed - NCBI", PubMed New and Noteworthyhttp://www.ncbi.nlm.nih.gov/pubmed/
  7. P. Lord, S. Cockell, D.C. Swan, and R. Stevens, "Ontogenesis Knowledgeblog: Lightweight Semantic Publishing", 1st Workshop on Semantic Web Technologies for Libraries and Readers, 2011. http://www.russet.org.uk/blog/2011/06/ontogenesis-knowledgeblog-lightweight-semantic-publishing/
  8. "OpenURL ContextObject in SPAN (COinS)", 2010. http://ocoins.info/
  9. P. Lord, "Knowledge Blog", Knowledge Blog, 2009. http://www.knowledgeblog.org/
  10. " Google Scholar Help "http://scholar.google.com/intl/en/scholar/inclusion.html
  11. "Co-Authors Plus", WordPress.orghttp://wordpress.org/extend/plugins/co-authors-plus/
  12. P. Lord, "The Ontogenesis Tutorial", An Exercise in Irrelevance, 2010. http://www.russet.org.uk/blog/2010/01/the-ontogenesis-tutorial/
  13. S. Cockell, "Fuzzier Logic", Fuzzier Logic, 2012. http://blog.fuzzierlogic.com/