<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>An Exercise in Irrelevance</title>
	<atom:link href="http://www.russet.org.uk/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.russet.org.uk/blog</link>
	<description>Knowledge, Biology and Ontologies</description>
	<lastBuildDate>Mon, 14 May 2012 10:40:30 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Kcite, Greycite and Kblog-metadata</title>
		<link>http://www.russet.org.uk/blog/2012/05/kcite-greycite-and-kblog-metadata/</link>
		<comments>http://www.russet.org.uk/blog/2012/05/kcite-greycite-and-kblog-metadata/#comments</comments>
		<pubDate>Mon, 07 May 2012 12:34:48 +0000</pubDate>
		<dc:creator>Phillip Lord</dc:creator>
				<category><![CDATA[Communication]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=2078</guid>
		<description><![CDATA[I am pleased to announce that as part of my work on knowledgeblog , we now have two new tools&#8201;&#8212;&#8201;Greycite and kblog-metadata&#8201;&#8212;&#8201;and have extended kcite, our citation engine . I will just give a brief overview here of the functionality here. Subsequent articles will describe these tools in more detail, explaining the rationale behind them. [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="2078">
<!-- coins metadata inserted by kblog-metadata -->
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=kblog-metadata.php&amp;rft.title=Kcite%2C+Greycite+and+Kblog-metadata&amp;rft.source=An+Exercise+in+Irrelevance&amp;rft.date=2012-05-07&amp;rft.identifier=http%3A%2F%2Fwww.russet.org.uk%2Fblog%2F2012%2F05%2Fkcite-greycite-and-kblog-metadata%2F&amp;rft.au=Phillip+Lord&amp;rft.format=text&amp;rft.language=English"></span><p><a name="preamble"></a> 
<p>I am pleased to announce that as part of my work on knowledgeblog <span class="kcite" kcite-id="ITEM-2078-0">(<a href="http://www.knowledgeblog.org/">http://www.knowledgeblog.org/</a>)</span>, we now have two new tools&#8201;&#8212;&#8201;Greycite and kblog-metadata&#8201;&#8212;&#8201;and have extended kcite, our citation engine <span class="kcite" kcite-id="ITEM-2078-1">(<a href="http://knowledgeblog.org/kcite-plugin">http://knowledgeblog.org/kcite-plugin</a>)</span>. I will just give a brief overview here of the functionality here. Subsequent articles will describe these tools in more detail, explaining the rationale behind them.</p>
<p>The kcite engine, which you can see in use in this article, produces a nicely formatted bibliography list, generated using only identifiers to these articles: DOIs, Pubmed IDs or arXiv IDs. One obvious absence from this list, however, is the ability to directly cite URLs. We have now started to address this, through our two new tools.</p>
<p>Unlike other identifiers, we lack a centralised resource capable of delivering bibliographic metadata about a URL. To enable this, my colleague, Lindsay Marshall <span class="kcite" kcite-id="ITEM-2078-2">(<a href="http://www.ncl.ac.uk/computing/staff/profile/lindsay.marshall">http://www.ncl.ac.uk/computing/staff/profile/lindsay.marshall</a>)</span>, has developed Greycite <span class="kcite" kcite-id="ITEM-2078-3">(<a href="http://greycite.knowledgeblog.org/">http://greycite.knowledgeblog.org/</a>)</span>, which went live earlier this week. Greycite allows you to search for bibliographic metadata about a given resource. So, for instance, you can <a href="http://greycite.knowledgeblog.org/?uri=http://www.russet.org.uk%2Fblog%2F2010%2F07%2Frealism-and-science%2F">view the metadata</a> for my article on realism <span class="kcite" kcite-id="ITEM-2078-4">(<a href="http://www.russet.org.uk/blog/2010/07/realism-and-science/">http://www.russet.org.uk/blog/2010/07/realism-and-science/</a>)</span>. Probably more usefully than this view, however, is that you can also retrieve this metadata computationally: currently, we support JSON suitable for citeproc-js <span class="kcite" kcite-id="ITEM-2078-5">(<a href="http://bitbucket.org/fbennett/citeproc-js">http://bitbucket.org/fbennett/citeproc-js</a>)</span>, and bibtex <span class="kcite" kcite-id="ITEM-2078-6">(<a href="http://www.bibtex.org/">http://www.bibtex.org/</a>)</span>. Obviously, we can support further formats if we choose; fortunately, the metadata for a URL is, in general, very simple (date, title, website or &#8220;container&#8221; title).</p>
<p>Greycite must, however, get its metadata from somewhere. As we wanted greycite to be both an automated and authoratitive source, we have decided to take metadata only from the URL being referenced (or referenced from the URL). Anything else would have required an authentication step, to prove that metadata was being provided by the owner of the content. I will describe this in more detail later; we support COiNS <span class="kcite" kcite-id="ITEM-2078-7">(<a href="http://ocoins.info/">http://ocoins.info/</a>)</span>, OGP <span class="kcite" kcite-id="ITEM-2078-8">(<a href="http://ogp.me/">http://ogp.me/</a>)</span> and Google Scholar Metatags <span class="kcite" kcite-id="ITEM-2078-9">(<a href="http://scholar.google.com/intl/en/scholar/inclusion.html">http://scholar.google.com/intl/en/scholar/inclusion.html</a>)</span>. In practice, this combination of sources allows us to provide rich references to many URLs. Where not, we fallback gracefully.</p>
<p>Unfortunately, formal metadata on the web is not heavily controlled or pre-defined. If you are using WordPress to publish your articles, it is largely dependant on your theme as to whether there is any metadata on your articles. I have started to address this with kblog-metadata <span class="kcite" kcite-id="ITEM-2078-10">(<a href="http://wordpress.org/extend/plugins/kblog-metadata/">http://wordpress.org/extend/plugins/kblog-metadata/</a>)</span>. Again, I will describe the functionality in greater detail later, but essentially, this plugin adds metadata in all three of the formats mentioned above in the document headers, and provides a good deal of flexibility about where that metadata comes from.</p>
<p>Finally, I have extended kcite to query for metadata from greycite for each URL cited. The data coming back is used directly for rendering, so this should have reasonable performance; moreover all data is cached in the WordPress database, limiting outgoing network traffic from the webserver for each reference.</p>
<p>Work is not complete yet, and there is much more to do. However, I have been using development versions of these tools now for a month or so, and the experience is rather good. The metadata is useful during authoring, as it can be used to find the correct reference. While we cannot capture metadata from all sources, a surprisingly large number of them do work. And the development of greycite means that this metadata can be served efficiently and without adding too much complexity to kcite. In short, while it may not be a complete solution, these enhancements represent a substantial step toward making academic URLs formally citable, as others have recently called for <span class="kcite" kcite-id="ITEM-2078-11">(<a href="http://michaelnielsen.org/blog/is-scientific-publishing-about-to-be-disrupted/">http://michaelnielsen.org/blog/is-scientific-publishing-about-to-be-disrupted/</a>)</span>.</p>
<hr /> 
<h2><a name="_addendum"></a>Addendum</h2>
<p>2012-05-09: I have already published an initial article <span class="kcite" kcite-id="ITEM-2078-12">(<a href="http://www.russet.org.uk/blog/2012/03/kblog-metadata/">http://www.russet.org.uk/blog/2012/03/kblog-metadata/</a>)</span> about kblog-metadata, which should have been referenced here.</p>

<p>Bibliography</p>
<div class="kcite-bibliography"></div>
<script type="text/javascript">var citeproc_controls=false;
var blog_home_url="http://www.russet.org.uk/blog/";
</script>

</div> <!-- kcite-section 2078 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2012/05/kcite-greycite-and-kblog-metadata/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Greycite: Citing the Web</title>
		<link>http://www.russet.org.uk/blog/2012/05/greycite-citing-the-web/</link>
		<comments>http://www.russet.org.uk/blog/2012/05/greycite-citing-the-web/#comments</comments>
		<pubDate>Sat, 05 May 2012 07:26:08 +0000</pubDate>
		<dc:creator>Phillip Lord</dc:creator>
				<category><![CDATA[Communication]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=2071</guid>
		<description><![CDATA[In this article, we will describe the rationale behind our new service, Greycite, that we have developed in general enable more formal citation of URLs, and specifically to back up the kcite citation engine. Authors and School of Computing Science Newcastle University Introduction As has been recently announced , the kcite citation engine , now [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="2071">
<!-- coins metadata inserted by kblog-metadata -->
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=kblog-metadata.php&amp;rft.title=Greycite%3A+Citing+the+Web&amp;rft.source=An+Exercise+in+Irrelevance&amp;rft.date=2012-05-05&amp;rft.identifier=http%3A%2F%2Fwww.russet.org.uk%2Fblog%2F2012%2F05%2Fgreycite-citing-the-web%2F&amp;rft.au=Phillip+Lord&amp;rft.format=text&amp;rft.language=English"></span><p><a name="preamble"></a> 
<p>In this article, we will describe the rationale behind our new service, Greycite, that we have developed in general enable more formal citation of URLs, and specifically to back up the kcite citation engine.</p>
<hr /> 
<h2><a name="_authors"></a>Authors</h2>
<p>Phillip Lord and Lindsay Marshall<br /> School of Computing Science<br /> Newcastle University</p>
<hr /> 
<h2><a name="_introduction"></a>Introduction</h2>
<p>As has been recently announced <span class="kcite" kcite-id="ITEM-2071-0">(<a href="http://www.russet.org.uk/blog/2012/05/kcite-greycite-and-kblog-metadata/">http://www.russet.org.uk/blog/2012/05/kcite-greycite-and-kblog-metadata/</a>)</span>, the kcite citation engine <span class="kcite" kcite-id="ITEM-2071-1">(<a href="http://www.russet.org.uk/blog/2011/12/kcite-the-next-generation/">http://www.russet.org.uk/blog/2011/12/kcite-the-next-generation/</a>)</span>, now supports URLs directly, as can be seen in this sentence. While it can do this trivially, by simply putting a URL in the reference, we wanted something better; where possible, we wanted URLs to be referenced in a similar manner to arXiv <span class="kcite" kcite-id="ITEM-2071-2">(<a href="http://arxiv.org/">http://arxiv.org/</a>)</span> or PubMed <span class="kcite" kcite-id="ITEM-2071-3">(<a href="http://www.ncbi.nlm.nih.gov/pubmed/">http://www.ncbi.nlm.nih.gov/pubmed/</a>)</span> IDs&#8201;&#8212;&#8201;with full bibliographic metadata where possible.</p>
<p>To achieve this, we have created the Greycite service, which captures metadata from a URL and then presents this back to kcite. In this short article, we describe the rationale behind the creation of this service.</p>
<hr /> 
<h2><a name="_discovering_the_metadata"></a>Discovering the metadata</h2>
<p>The kcite citation engine allows WordPress users to reference an article through the use of a shortcode, of the form [&zwnj;cite]10.1371/journal.pone.0012258[/cite&zwj;] which is rendered as <span class="kcite" kcite-id="ITEM-2071-4">(<a href="http://dx.doi.org/10.1371/journal.pone.0012258">http://dx.doi.org/10.1371/journal.pone.0012258</a>)</span>. The rendering uses metadata from a third party service, in this case provided by CrossRef <span class="kcite" kcite-id="ITEM-2071-5">(<a href="http://www.crossref.org/CrossTech/2011/04/content_negotiation_for_crossr.html">http://www.crossref.org/CrossTech/2011/04/content_negotiation_for_crossr.html</a>)</span>, to generate the bibliography reference. Other identifiers are handled similarly, using other services.</p>
<p>We wished to achieve something similar with an arbitrary URL. However, there is no centralised service where authors are required to lodge their metadata for any URL. We considered the possibility of providing such a service where content authors could lodge their metadata&#8201;&#8212;&#8201;author, date, title and so on, about a URL. However, it seems unlikely that this would succeed for two critical reasons. First, and most importantly, few authors would be likely to go the extra effort: why would they bother, and if they did why use our service rather than some other. Second, it would require a authentication step to ensure that metadata genuinely came from the person controlling the URL. We also considered the possibility of deliberately allowing third party addition of metadata, but this raises the question of conflicts in the metadata.</p>
<p>As a result, in practice, we feel that the only sensible cause of action is to extract the metadata directly from the resolvable contents of the URL, as this ensures that we have taken metadata from what is (quite literally) the authoratitive source. The significant drawback to this is that if the author does not provide this metadata, no one else is able to do so. In a sense, though, this is correct: if authors provide no metadata, then this is how their works should appear, as this is their choice. Moreover, as we have argued previously <span class="kcite" kcite-id="ITEM-2071-6">(<a href="http://www.russet.org.uk/blog/2012/04/three-steps-to-heaven/">http://www.russet.org.uk/blog/2012/04/three-steps-to-heaven/</a>)</span>, if authors or their readers are worried by this, it may provide the motivation to add bibliographic metadata to their work which is a benefit to everyone.</p>
<p>The immediate problem here is the lack of single standardised bibliographic metadata on the web; however, there are a number of systems which are currently in use, namely, COinS <span class="kcite" kcite-id="ITEM-2071-7">(<a href="http://ocoins.info/">http://ocoins.info/</a>)</span>, Open Graph Protocol <span class="kcite" kcite-id="ITEM-2071-8">(<a href="http://ogp.me/">http://ogp.me/</a>)</span> and Google Scholar tags <span class="kcite" kcite-id="ITEM-2071-9">(<a href="http://scholar.google.com/intl/en/scholar/inclusion.html">http://scholar.google.com/intl/en/scholar/inclusion.html</a>)</span>. We also have also considered a fourth option which is RSS/Atom feeds which, perhaps ironically, are structured enough to provide bibliographic metadata. At the moment, we do not have accurate statistics on the prevelance of each of these types of metadata&#8201;&#8212;&#8201;of course, we could crawl the web to gather these statistics, but we are not really interested in the web in general, but in the academic sector of it which is hard to determine <strong>a priori</strong>. However, our initial experiences suggest the following:</p>
<ul> 
<li> COinS metadata is not widespread. We suspect that this follows from our experience that the specification is hard to find and incomprehensible when you do <span class="kcite" kcite-id="ITEM-2071-10">(<a href="http://www.russet.org.uk/blog/2012/03/kblog-metadata/">http://www.russet.org.uk/blog/2012/03/kblog-metadata/</a>)</span>. </li>
<li> Google Scholar tags are much more widespread, although there is some variation (the use of <tt>name</tt> vs <tt>property</tt> for instance, or multiple authors represented in a single tag vs each author on their own). </li>
<li> OGP appears reasonably widespread, including in articles which are not academic (or not solely so) but likely to be cited, such as BBC News, or anything hosted on WordPress.com. </li>
<li> RSS/Atom worked fairly well, however normally only contain metadata for recent articles; we tried to track RSS feeds, but this resulted in 1000s of URLs very quickly. </li>
</ul>
<p>Over time, we should be able to get clearer statistics as to real usage of these systems, based on the data in greycite.</p>
<hr /> 
<h2><a name="_greycite_as_a_service"></a>Greycite as a service</h2>
<p>Greycite is currently packaged as a service, rather than embedded within WordPress, which would also have been possible. The reasons for this were several. First, gathering metadata involves a reasonable amount of parsing, and putting this all into a WordPress plugin seemed unnecessarily heavy. This is particularly so, given that server load is already an issue with kcite, and adding further to this did not seem sensible.</p>
<p>Second, we wanted to maintain a database of the metadata gathered from around the web. This allows us to deal with problems of resources changing or disappearing. We want the user to be able to cite a URL and for this citation to not break if the URL disappears and becomes <tt>404</tt>. We also wish to be able to cite a URL at a specific date, and have the citation show the metadata for that time. Placing this load on the individual wordpress database backend does not really make sense. Moreover, with greycite, there is a reasonable likelihood that others will have cited a particular article, thereby sharing the load.</p>
<p>Third, Greycite is also useful outside of WordPress. So, for instance, Greycite also provides bibtex so can be used with a bibliographic manager, which is very useful at authoring time, as we can use this metadata to search over a list of relevant URLs, and then to select between then.</p>
<p>Finally, we wanted to be able to add additional functionality, which may require upgrading the database periodically, which is harder to do within a plugin. For example, we have already added links through to the UK Web Archive <span class="kcite" kcite-id="ITEM-2071-11">(<a href="http://www.webarchive.org.uk/ukwa/">http://www.webarchive.org.uk/ukwa/</a>)</span>, for those resources which are archived. We will add the Internet Archive <span class="kcite" kcite-id="ITEM-2071-12">(<a href="http://www.archive.org/">http://www.archive.org/</a>)</span>, and Web Cite <span class="kcite" kcite-id="ITEM-2071-13">(<a href="http://www.webcitation.org/">http://www.webcitation.org/</a>)</span> in time also. This means that not only should citations remain displayed correctly if resources disappear or change, it should still be possible to get to their contents in many cases.</p>
<hr /> 
<h2><a name="_the_article_as_a_linked_data"></a>The article as a linked data</h2>
<p>The existence of Greycite allows us to turn a blog post into a linked data, academic article. The reader of an article sees as well as the content directly generated by the author, data gathered from all the outgoing links. The reference list, therefore ceases to be a mechanism for finding secondary sources, and becomes a usability tool; readers can understand what sources are being relied on, without having to remember URLs or click through to them. Likewise, the authors can use the linked data environment outside of a web browser to help enable authoring. Metadata that is useful to readers is, unsurprisingly, also useful to authors (who tend to be the first person to read an article anyway!).</p>
<hr /> 
<h2><a name="_discussion"></a>Discussion</h2>
<p>With Greycite, we were interested in adding more formal citation to the web in general, and more specifically supporting kcite <span class="kcite" kcite-id="ITEM-2071-1">(<a href="http://www.russet.org.uk/blog/2011/12/kcite-the-next-generation/">http://www.russet.org.uk/blog/2011/12/kcite-the-next-generation/</a>)</span>. We believe that we have achieved this in part with a relatively light-weight service. Greycite is useful for article display, and for authoring.</p>
<p>In addition, we start to address the issues of link breakage, by building on the back of existing archiving services. Articles will be able to still display article metadata if an article disappears. Future versions of kcite will also redirect links to the nearest web archive when this happens. We have done this without the recourse to secondary identifiers such as a DOI or PURL, which we believe represents a better user experience. Building on the back of existing web archives also addresses a critically scalability issue; the Greycite database needs only to store bibliographic metadata which is likely to remain tractable. From a legal perspective, we also side-step issues of copyright, as gathering metadata alone is likely to be covered by fair dealing clauses.</p>
<p>By depending only on metadata present in the URL itself, we can guarantee that metadata is authoratitive (not, of course, that it is &#8220;correct&#8221;, as in reflects the authors intentions, but it does match what they said). It also means that we do not control the metadata; it has not been entered into greycite; it is out there, available on the web, free for anyone to gather. We wish to be part of the semantic web, not a walled garden within it.</p>
<p>Finally, we have started to build a linked data environment for academic publishing. Bibliographic metadata is, of course, only the start. It is not a suitable way to present all kinds of information; for instance, Chemicalize <span class="kcite" kcite-id="ITEM-2071-14">(<a href="http://www.chemicalize.org/">http://www.chemicalize.org/</a>)</span> provide a nice plugin which transforms chemical names into something richer. But by harnessing the power of the web, and building on existing resources, we should be able to build a rich and full featured environment for presenting scientific knowledge.</p>

<p>Bibliography</p>
<div class="kcite-bibliography"></div>
<script type="text/javascript">var citeproc_controls=false;
var blog_home_url="http://www.russet.org.uk/blog/";
</script>

</div> <!-- kcite-section 2071 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2012/05/greycite-citing-the-web/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Semantic Linking Studentship</title>
		<link>http://www.russet.org.uk/blog/2012/04/semantic-linking-studentship/</link>
		<comments>http://www.russet.org.uk/blog/2012/04/semantic-linking-studentship/#comments</comments>
		<pubDate>Wed, 18 Apr 2012 13:41:26 +0000</pubDate>
		<dc:creator>Phillip Lord</dc:creator>
				<category><![CDATA[Science]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=2057</guid>
		<description><![CDATA[I have a PhD studentship available for anyone wishing to work on using the Semantic Web and linked data to improve the process of scientific publishing. I want to expand on the work that we have done with Kcite , which links between different articles, and consider how we would link to and from both [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="2057">
<!-- coins metadata inserted by kblog-metadata -->
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=kblog-metadata.php&amp;rft.title=Semantic+Linking+Studentship&amp;rft.source=An+Exercise+in+Irrelevance&amp;rft.date=2012-04-18&amp;rft.identifier=http%3A%2F%2Fwww.russet.org.uk%2Fblog%2F2012%2F04%2Fsemantic-linking-studentship%2F&amp;rft.au=Phillip+Lord&amp;rft.format=text&amp;rft.language=English"></span><p><a name="preamble"></a> 
<p>I have a PhD studentship available for anyone wishing to work on using the Semantic Web and linked data to improve the process of scientific publishing.</p>
<p>I want to expand on the work that we have done with Kcite <span class="kcite" kcite-id="ITEM-2057-0">(<a href="http://www.russet.org.uk/blog/2012/02/kcite-spreads-its-wings/">http://www.russet.org.uk/blog/2012/02/kcite-spreads-its-wings/</a>)</span>, which links between different articles, and consider how we would link to and from both raw data and ontological resources. We will do this in a practical, real-world environment: we will be extending WordPress server-side; all the tools that we generate we will be released as we go into the &#8220;wild&#8221;; we will be active at supporting users so that we can incorporate feedback. We will be targetting the academic blogosphere, in addition to working with the content on <a href="http://knowledgeblog.org">http://knowledgeblog.org</a>.</p>
<p>If you are interested, please feel free to email me directly. The full details of the advert are below.</p>
<hr /> 
<h2><a name="_advert"></a>Advert</h2>
<p><a href="http://www.ncl.ac.uk/postgraduate/funding/search/list/cs024">http://www.ncl.ac.uk/postgraduate/funding/search/list/cs024</a></p>
<p>The linked data initiative seeks to increase the machine computability of the web, but it is hard for authors to generate linked-data. We will investigate ways of publishing scientific knowledge where authors, readers and computational agents all gain advantage from additional semantics and machine computability. We will investigate representation of graph data and deep linking to ontological resources.</p>
<p>This project will feed into Knowledgeblog (<a href="http://knowledgeblog.org">http://knowledgeblog.org</a>) which is both a high traffic (100k+ page reads) academic site in its own right, as well releasing its software for third party use by the academic blogosphere. Combined with exemplars using real data where possible, this will provide two valuable routes to evaluate and assess the representations in a real-world environment.</p>
<h3><a name="_value_of_the_award_of_eligibility"></a>Value of the Award of Eligibility</h3>
<p>Depending on how you meet the EPSRC criteria (<a href="http://www.epsrc.ac.uk/funding/students/pages/eligibility.aspx">http://www.epsrc.ac.uk/funding/students/pages/eligibility.aspx</a>.) you may be entitled to a full or partial award. A full award covers tuition fees at the UK/EU rate and an annual stipend of £14,790 (2012/13). A partial award covers fees at the UK/EU rate only. The studentship is not available for candidates from outside of the EU.</p>
<h3><a name="_person_specification"></a>Person Specification</h3>
<p>You should have either a First class honours degree in Computing Science, Mathematics, or other relevant science or engineering subject, or a or 2.1 in Computing Science, Mathematics or other relevant science or engineering subject and a distinction level Masters degree in a related subject. Equivalent experience will also be considered.</p>
<h3><a name="_how_to_apply"></a>How to Apply</h3>
<p>Apply through the University’s online postgraduate application form insert the reference CS024 and select ‘PhD COMP’, with programme code 8050F, as programme of study. Mandatory fields need to be completed and a covering letter, CV and (if English is not your first language) a copy of your English language qualifications attached. The letter must state the title of studentship, quote reference CS024 and describe how your research interests fit with the topic of the research projected outlined (max. 2 pages). If you already have published research papers a list of these providing bibliographic details should be included in the letter.</p>
<p>You should also send your covering letter and CV to the Postgraduate Secretary at <a href="mailto:cs.pg@ncl.ac.uk">cs.pg@ncl.ac.uk</a>.</p>
<h3><a name="_further_information"></a>Further Information</h3>
<p>For further details, please contact Phillip Lord (<a href="mailto:phillip.lord@newcastle.ac.uk">phillip.lord@newcastle.ac.uk</a>), 0191 222 7827</p>

<p>Bibliography</p>
<div class="kcite-bibliography"></div>
<script type="text/javascript">var citeproc_controls=false;
var blog_home_url="http://www.russet.org.uk/blog/";
</script>

</div> <!-- kcite-section 2057 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2012/04/semantic-linking-studentship/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Three Steps to Heaven</title>
		<link>http://www.russet.org.uk/blog/2012/04/three-steps-to-heaven/</link>
		<comments>http://www.russet.org.uk/blog/2012/04/three-steps-to-heaven/#comments</comments>
		<pubDate>Thu, 12 Apr 2012 13:34:37 +0000</pubDate>
		<dc:creator>Phillip Lord</dc:creator>
				<category><![CDATA[Papers]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=2054</guid>
		<description><![CDATA[, and School of Computing Science, Newcastle University,Newcastle-upon-Tyne, UKBioinformatics Support Unit, Newcastle University,Newcastle-upon-Tyne, UKSchool of Computer Science, University of Manchester, UKphillip.lord@newcastle.ac.uk Semantic publishing offers the promise of computable papers, enriched visualisation and a realisation of the linked data ideal. In reality, however, the publication process contrives to prevent richer semantics while culminating in a ‘lumpen’ [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="2054">
<!-- coins metadata inserted by kblog-metadata -->
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=kblog-metadata.php&amp;rft.title=Three+Steps+to+Heaven&amp;rft.source=SePublica+2012&amp;rft.date=2012-04-12&amp;rft.identifier=http%3A%2F%2Fwww.russet.org.uk%2Fblog%2F2012%2F04%2Fthree-steps-to-heaven%2F&amp;rft.au=Phillip+Lord&amp;rft.format=text&amp;rft.language=English"></span><div class="titlepage">
<p>Phillip Lord, Simon Cockell and Robert Stevens<br />School of Computing
Science, Newcastle University,<br />Newcastle-upon-Tyne,
UK<br />Bioinformatics Support Unit, Newcastle
University,<br />Newcastle-upon-Tyne, UK<br />School of Computer Science,
University of Manchester,
UK<br /><a href="phillip.lord@newcastle.ac.uk">phillip.lord@newcastle.ac.uk</a></p>
</div>
<div class="abstract">
<p> Semantic publishing offers the promise of
computable papers, enriched visualisation and a realisation of the linked data
ideal. In reality, however, the publication process contrives to prevent
richer semantics while culminating in a ‘lumpen’ PDF. In this paper, we
discuss a web-first approach to publication, and describe a three-tiered
approach which integrates with the existing authoring tooling. Critically,
although it adds limited semantics, it does provide value to all the
participants in the process: the author, the reader and the
machine. </p>
<p><b class="bfseries">License:</b> This work is licensed under a
Creative Commons Attribution 3.0 Unported License.
<a href="http://creativecommons.org/licenses/by/3.0/">http://creativecommons.org/licenses/by/3.0/</a>. It is also available
at <a href="http://www.russet.org.uk/blog/2012/04/three-steps-to-heaven/">http://www.russet.org.uk/blog/2012/04/three-steps-to-heaven/</a>. It was written for <a href="http://sepublica.mywikipaper.org/drupal/">SePublica 2012</a>. </p>
</div>
<h1 id="sec:introduction">1
Introduction</h1>
<p>The publishing of both data and narratives on those data are changing
  radically. Linked Open Data and related semantic technologies allow for
  semantic publishing of data. We still need, however, to publish the
  narratives on that data and that style of publishing is in the process of
  change; one of those changes is the incorporation of semantics
  <span class="kcite" kcite-id="ITEM-2054-0">(<a href="http://dx.doi.org/10.1109/MIS.2006.62">http://dx.doi.org/10.1109/MIS.2006.62</a>)</span><span class="kcite" kcite-id="ITEM-2054-1">(<a href="http://dx.doi.org/10.1087/2009202">http://dx.doi.org/10.1087/2009202</a>)</span><span class="kcite" kcite-id="ITEM-2054-2">(<a href="http://dx.doi.org/10.1371/journal.pcbi.1000361">http://dx.doi.org/10.1371/journal.pcbi.1000361</a>)</span>.
  The idea of semantic publishing is an attractive one for those who wish to
  consume papers electronically; it should enhance the richness of the
  computational component of
  papers <span class="kcite" kcite-id="ITEM-2054-1">(<a href="http://dx.doi.org/10.1087/2009202">http://dx.doi.org/10.1087/2009202</a>)</span>.
  It promises a realisation of the vision of a next generation of the web,
  with papers becoming a critical part of a linked data
  environment <span class="kcite" kcite-id="ITEM-2054-0">(<a href="http://dx.doi.org/10.1109/MIS.2006.62">http://dx.doi.org/10.1109/MIS.2006.62</a>)</span>,<span class="kcite" kcite-id="ITEM-2054-3">(<a href="http://dx.doi.org/10.4018/jswis.2009081901">http://dx.doi.org/10.4018/jswis.2009081901</a>)</span>,
  where the results and naratives become one. </p>
<p>The reality, however, is
  somewhat different. There are significant barriers to the acceptance of
  semantic publishing as a standard mechanism for academic publishing. The web
  was invented around 1990 as a light-weight mechanism for publication of
  documents. It has subsequently had a massive impact on society in general.
  It has, however, barely touched most scientific publishing; while most
  journals have a website, the publication process still revolves around the
  generation of papers, moving from Microsoft Word or
  L<sup style="font-variant:small-caps;
  margin-left:-0.3em">a</sup>T<sub style="text-transform:uppercase;
  margin-left:-0.2em">e</sub>X <span class="kcite" kcite-id="ITEM-2054-4">(<a href="http://www.latex-project.org">http://www.latex-project.org</a>)</span>,
  through to a final PDF which looks, feels and is something designed to be
  printed onto paper (this includes conferences dedicated to the web and the
  use of web technologies). Adding semantics
  into this environment is difficult or impossible; the content of the PDF has
  to be exposed and semantic content retro-fitted or, in all likelihood, a
  complex process of author and publisher interaction has to be devised and
  followed. If semantic data publishing and semantic publishing of academic
  narratives are to work together, then academic publishing needs to
  change. </p>
<p>In this paper, we describe our attempts to take a commodity
  publication environment, and modify it to bring in some of the formality
  required from academic publishing. We illustrate this with three
  exemplars—different kinds of knowledge that we wish to enhance. In the
  process, we add a small amount of semantics to the finished articles. Our
  key constraint is the desire to add value for all the human participants.
  Both authors and readers should see and recognise additional value, with the
  semantics a useful or necessary byproduct of the process, rather than the
  primary motivation. We characterise this process as our “three steps to
  heaven”, namely: </p>
<ul class="itemize">
<li>
<p>make life better for the machine to </p>
</li>
<li>
<p>make life better for the author to </p>
</li>
<li>
<p>make life better for the reader </p>
</li>
</ul>
<p>While requiring additional value for all of these participants is
  hard, and places significant limitations on the level of semantics that can
  be achieved, we believe that it does increase the likelihood that content
  will be generated in the first place, and represents an attempt to enable
  semantic publishing in a real-world workflow. </p>
<h1 id="sec:knowledgeblog">2 Knowledgeblog</h1>
<p>The <a href="http://knowledgeblog.org">knowledgeblog</a> project stemmed
from the desire for a book describing the many aspects of ontology
development, from the underlying formal semantics, to the practical technology
layer and, finally, through to the knowledge domain <span class="kcite" kcite-id="ITEM-2054-5">(<a href="http://www.russet.org.uk/blog/2011/06/ontogenesis-knowledgeblog-lightweight-semantic-publishing/">http://www.russet.org.uk/blog/2011/06/ontogenesis-knowledgeblog-lightweight-semantic-publishing/</a>)</span>.
However, we have found the traditional book publishing process frustrating and
unrewarding. While scientific authoring is difficult in its own right, our own
experience suggests that the <em>publishing</em> process is extremely
hard-work. This is particularly so for multi-author collected works which are
often harder for the editor than writing a book “solo”. Finally, the expense
and hard copy nature of academic books means that, again in our experience,
few people read them. </p>
<p>This contrasts starkly with the web-first
publication process that has become known as blogging. With any of a number of
ready made platforms, it is possible for authors with little or no technical
skill, to publish content to the web with ease. For knowledgeblog (“kblog”),
we have taken one blogging engine, WordPress <span class="kcite" kcite-id="ITEM-2054-6">(<a href="http://www.wordpress.org">http://www.wordpress.org</a>)</span>, running on low-end hardware, and
used it to develop a multi-author resource describing the use of ontologies in
the life sciences (our main field of expertise). There are also kblogs on
bioinformatics <span class="kcite" kcite-id="ITEM-2054-7">(<a href="http://bioinformatics.knowledgeblog.org">http://bioinformatics.knowledgeblog.org</a>)</span> and the Taverna
workflow environment <span class="kcite" kcite-id="ITEM-2054-8">(<a href="http://taverna.knowledgeblog.org">http://taverna.knowledgeblog.org</a>)</span><span class="kcite" kcite-id="ITEM-2054-9">(<a href="http://dx.doi.org/10.1093/nar/gkl320">http://dx.doi.org/10.1093/nar/gkl320</a>)</span>.
We have previously described how we addressed some of the social aspects,
including attribution, reviewing and immutablity of
articles <span class="kcite" kcite-id="ITEM-2054-5">(<a href="http://www.russet.org.uk/blog/2011/06/ontogenesis-knowledgeblog-lightweight-semantic-publishing/">http://www.russet.org.uk/blog/2011/06/ontogenesis-knowledgeblog-lightweight-semantic-publishing/</a>)</span></p>
<p>As
well as delivering content, we are also using this framework to
investigate <em>semantic academic publishing</em>, investigating how we can
enhance the machine interpretability of the final paper, while living within
the key constraint of making life (slightly) better for machine, author and
reader without adding complexity for the human participants. </p>
<p>Scientific
authors are relatively conservative. Most of them have well-established
toolsets and workflows which they are relatively unwilling to change. For
instance, within the kblog project, we have used workshops to start the
process of content generation. For our initial meeting, we gave little
guidance on authoring process to authors, as a result of which most attempted
to use WordPress directly for authoring. The WordPress editing environment is,
however, web-based, and was originally designed for editing short,
non-technical articles. It appeared to not work well for most
scientists. </p>
<p>The requirements that authors have for such ‘scientific’
articles are manifold. Many wish to be able to author while offline
(particularly on trains or planes). Almost all scientific papers are
multi-author, and some degree of collaboration is required. Many scientists in
the life sciences wish to author in Word because grant bodies and journals
often produce templates as Word documents. Many wish to use
L<sup style="font-variant:small-caps;
margin-left:-0.3em">a</sup>T<sub style="text-transform:uppercase;
margin-left:-0.2em">e</sub>X, because its idiomatic approach to programming
documents is unreplicable with anything else. Fortunately, it is possible to
induce WordPress to accept content from many different authoring tools,
including Word and L<sup style="font-variant:small-caps;
margin-left:-0.3em">a</sup>T<sub style="text-transform:uppercase;
margin-left:-0.2em">e</sub>X <span class="kcite" kcite-id="ITEM-2054-5">(<a href="http://www.russet.org.uk/blog/2011/06/ontogenesis-knowledgeblog-lightweight-semantic-publishing/">http://www.russet.org.uk/blog/2011/06/ontogenesis-knowledgeblog-lightweight-semantic-publishing/</a>)</span></p>
<p>As
a result, during the kblog project, we have seem many different workflows in
use, often highly idiosyncratic in nature. These
include: </p>
<dl class="description">
<dt>Word/Email:</dt>
<dd>
<p>Many authors write using MS Word and collaborate by emailing files around. This method has a low barrier to entry, but requires significant social processes to prevent conflicting versions, particularly as the number of authors increases. </p>
</dd>
<dt>Word/Dropbox:</dt>
<dd>
<p>For the taverna kblog <span class="kcite" kcite-id="ITEM-2054-8">(<a href="http://taverna.knowledgeblog.org">http://taverna.knowledgeblog.org</a>)</span>, authors wrote in
    Word and collaborated with Dropbox <span class="kcite" kcite-id="ITEM-2054-10">(<a href="http://www.dropbox.com">http://www.dropbox.com</a>)</span>. This method works reasonably well where many authors are involved; Dropbox detects conflicts, although it cannot prevent or merge them. </p>
</dd>
<dt>Asciidoc/Dropbox:</dt>
<dd>
<p>Used by the authors of this paper. Asciidoc <span class="kcite" kcite-id="ITEM-2054-11">(<a href="http://www.methods.co.nz/asciidoc">http://www.methods.co.nz/asciidoc</a>)</span> is relatively
    simple, somewhat programmable and accessible. Unlike
    L<sup style="font-variant:small-caps;
    margin-left:-0.3em">a</sup>T<sub style="text-transform:uppercase;
    margin-left:-0.2em">e</sub>X which can be induced to produce HTML with
    effort, asciidoc is designed to do so. </p>
</dd>
</dl>
<p>Of these three approaches probably the Word/Dropbox combination is the the most generally used. </p>
<p>From the readers perspective, a decision that we have made within knowledgeblog is to be “HTML-first”. The initial reasons for this were entirely practical; supporting multiple toolsets is hard, particularly if any degree of consistency is to be maintained; the generation of the HTML is at least partly controlled by the middleware – WordPress in kblog’s case. As well as enabling consistency of presentation, it also, potentially, allows us to add additional knowledge; it makes semantic publication a possibility. However, we are aware that knowledgeblog currently scores rather badly on what we describe as the “bath-tub test”; while exporting to PDF or printing out is possible, the presentation is not as “neat” as would be ideal. In this regard (and we hope only in this regard), the knowledgeblog experience is limited. However, increasingly, readers are happy and capable of interacting with material on the web, without print outs. </p>
<p>From this background and aim, we have drawn the following requirements: </p>
<ol class="enumerate">
<li>
<p>The author can, as much as possible, remain within familiar authoring environments; </p>
</li>
<li>
<p>The representation of the published work should remain extensible to, for instance, semantic enhancements; </p>
</li>
<li>
<p>The author and reader should be able to have the amount of “formal” academic publishing they need; </p>
</li>
<li>
<p>Support for semantic publishing should be gradual and offer advantages for author and reader at all stages. </p>
</li>
</ol>
<p>We describe how we have achieved this with three exemplars, two of
  which are relatively general in use, and one more specific to biology. In
  each case, we have taken a slightly different approach, but have fulfilled
  our primary aim of making life better for machine, author and reader. </p>
<h1 id="sec:repr-math">3 Representing Mathematics</h1>
<p>The representation of mathematics is a common need in academic literature.
Mathematical notation has grown from a requirement for a syntax which is
highly expressive and relatively easy to write. It presents specific
challenges because of its complexity, the difficulty of authoring and the
difficulty of rendering, away from the chalk board that is its natural
home. </p>
<p>Support for mathematics has had a significant impact on academic
publishing. It was, for example, the original motivation behind the
development of T<sub style="text-transform:uppercase;
margin-left:-0.2em">e</sub>X <span class="kcite" kcite-id="ITEM-2054-12">(<a href="http://en.wikipedia.org/wiki/TeX">http://en.wikipedia.org/wiki/TeX</a>)</span></span>, and it still one
of the main reasons why authors wish to use it or its derivatives. This is to
such an extent that much mathematics rendering on the web is driven by a
T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X engine
somewhere in the process. So MediaWiki (and therefore Wikipedia), Drupal and,
of course, WordPress follow this route. The latter provides plugin support for
T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X markup
using the <tt class="ttfamily">wp-latex</tt> plugin <span class="kcite" kcite-id="ITEM-2054-13">(<a href="http://wordpress.org/extend/plugins/wp-latex/">http://wordpress.org/extend/plugins/wp-latex/</a>)</span>. Within
kblog, we have developed a new plugin
called <tt class="ttfamily">mathjax-latex</tt> <span class="kcite" kcite-id="ITEM-2054-14">(<a href="http://wordpress.org/extend/plugins/mathjax-latex/">http://wordpress.org/extend/plugins/mathjax-latex/</a>)</span> From
the kblogauthor’s perspective these two offer a similar interface –
differences are, therefore, described later. </p>
<p>Authors write their
mathematics directly as T<sub style="text-transform:uppercase;
margin-left:-0.2em">e</sub>X using one of the four markup syntaxes. The most
explicit (and therefore least likely to happen accidentally) is through the
use of “shortcodes” <span class="kcite" kcite-id="ITEM-2054-15">(<a href="http://codex.wordpress.org/Shortcode">http://codex.wordpress.org/Shortcode</a>)</span>. </p>
<p>These are a
HTML-like markup originating from some forum/bulletin board systems. In this
form an equation would be entered as <tt>&#91;latex&#93;e=mc^2&#91;/latex&#93;</tt>, which
would be rendered as “\(e=mc^2\)”. It is also possible to use three
other syntaxes which are closer to math-mode in
T<sub style="text-transform:uppercase;
margin-left:-0.2em">e</sub>X: <tt>$&zwj;$e=mc^2$&zwj;$</tt>, <tt>&#36;latex e=mc^2&#36;</tt>,
or <tt>\&zwj;[e=mc^2\&zwj;]</tt>. </p>
<p>From the authorial perspective, we have added
significant value, as it is possible to use a variety of syntaxes, which are
independent of the authoring engine. For example, a
T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X-loving
mathematician working with a Word-using biologist can still set their
equations using T<sub style="text-transform:uppercase;
margin-left:-0.2em">e</sub>X syntax; although Word will not render these at
authoring time but, in practice, this causes few problems for such authors,
who are experienced at reading T<sub style="text-transform:uppercase;
margin-left:-0.2em">e</sub>X. Within an L<sup style="font-variant:small-caps;
margin-left:-0.3em">a</sup>T<sub style="text-transform:uppercase;
margin-left:-0.2em">e</sub>X workflow equations will be renderable both
locally with source compiled to PDF, and published to WordPress. </p>
<p>There
is also a W3C recommendation, MathML for the representation and presentation
of mathematics. The kblog environment also supports this. In this case, the
equivalent source appears as follows: </p>
<pre>
 &lt;math&gt;
 &lt;mrow&gt;
&lt;mi&gt;E&lt;/mi&gt;
 &lt;mo&gt;=&lt;/mo&gt;
 &lt;mrow&gt;
&lt;mi&gt;m&lt;/mi&gt;
 &lt;msup&gt;
 &lt;mi&gt;c&lt;/mi&gt;
&lt;mn&gt;2&lt;/mn&gt;
 &lt;/msup&gt;
 &lt;/mrow&gt;
 &lt;/mrow&gt;
&lt;/math&gt;
</pre>
<p>One problem with the MathML representation is obvious: it is very
  long-winded. A second issue, however, is that it is hard to integrate with
  existing workflows; most of the publication workflows we have seen in use
  will on recognising an angle bracket turn it into the equivalent HTML
  entity. For some workflows (L<sup style="font-variant:small-caps;
                                           margin-left:-0.3em">a</sup>T<sub style="text-transform:uppercase;
                                                                                   margin-left:-0.2em">e</sub>X,
  asciidoc) it is <em>possible</em>, although not easy, to prevent this within
  the native syntax. </p>
<p>It is also possible to convert from Word’s native
  OMML (“equation editor”) XML representation to MathML, although this does
  not integrate with Word’s native blog publication workflow. Ironically, it
  is because MathML shares an XML based syntax with the final presentation
  format (HTML) that the problem arises. The shortcode syntax, for example,
  passes straight-through most of the publication frameworks to be consumed by
  the middleware. From a pragmatic point of view, therefore, supporting
  shortcodes and T<sub style="text-transform:uppercase;
                              margin-left:-0.2em">e</sub>X-like syntaxes has
  considerable advantages. </p>
<p>For the reader, the use of <tt class="ttfamily">mathjax-latex</tt> has
significant advantages. The default mechanism within WordPress uses a
math-mode like syntax <tt>$&zwj;latex e=mc^2&zwj;$</tt>. This is rendered using a
T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X engine
into an image which is then incorporated and linked using normal HTML
capabilities. This representation is opaque and non-semantic; it has
significant limitations for the reader. The images are not scalable – zooming
in cases severe pixalation; the background to the mathematics is coloured
inside the image, so does not necessarily reflect the local
style. </p>
<p>Kblog, however, uses the MathJax
library <span class="kcite" kcite-id="ITEM-2054-16">(<a href="http://www.mathjax.org">http://www.mathjax.org</a>)</span>
this has a number of significant advantages for the reader. First, where the
browser supports them, MathJax uses webfonts to render the images; these are
scalable, attractive and standardized. Where they are not available, MathJax
can fall-back to bitmapped fonts. The reader can also access additional
functionality: clicking on an equation will raise a zoomed in popup; while the
context menu allows access to a textual representation either as
T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X or MathML
irrespective of the form that the author used. This can be cut-and-paste for
further use. Kblog uses the MathJax
library <span class="kcite" kcite-id="ITEM-2054-16">(<a href="http://www.mathjax.org">http://www.mathjax.org</a>)</span>
to render the underlying T<sub style="text-transform:uppercase;
margin-left:-0.2em">e</sub>X directly on the client. </p>
<p>Our use of MathJax
provides no significant disadvantages to the middleware layers. It is
implemented in JavaScript and runs in most environments. Although, the library
is fairly large (&gt;100Mb), but is available on a CDN so need not stress
server storage space. Most of this space comes from the bit-mapped fonts which
are only downloaded on-demand, so should not stress web clients either. It
also obviates the need for a T<sub style="text-transform:uppercase;
margin-left:-0.2em">e</sub>X installation
which <tt class="ttfamily">wp-latex</tt> may require (although this plugin can
use an external server also). </p>
<p>At face
value, <tt class="ttfamily">mathjax-latex</tt> necessarily adds very little
semantics to the maths embedded within documents. The maths could be
represented as <tt>$&zwj;$E=mc^2$&zwj;$</tt>, <tt>\&zwj;(E=mc^2\&zwj;)</tt> or </p>
<pre>
&lt;math&gt; &lt;mrow&gt; &lt;mi&gt;E&lt;/mi&gt; &lt;mo&gt;=&lt;/mo&gt;
&lt;mrow&gt; &lt;mi&gt;m&lt;/mi&gt;
 &lt;msup&gt;
&lt;mi&gt;c&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt; &lt;/msup&gt;
 &lt;/mrow&gt;
&lt;/mrow&gt; &lt;/math&gt;
</pre>
<p>So, we have a heterogenous representation for identical knowledge. However, in practice, the situation is much better than this. The author of the work created these equations and has then read them, transformed by MathJax into a rendered form. If MathJax has failed to translate them correctly, in line with the author’s intention, or if it has had some implications for the text in addition to setting the intended equations (if the T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X style markup appears accidentally elsewhere in the document), the author is likely to have seen this and fixed the problem. Someone wishing, for example, to extract all the mathematics as MathML from these documents computationally, therefore, knows: </p>
<ul class="itemize">
<li>
<p>that the document contains maths as it imports
    MathJax </p>
</li>
<li>
<p>that MathJax is capable of identifying this maths
      correctly </p>
</li>
<li>
<p>that equations can be transformed to MathML
      using MathJax (This
    is assuming MathJax works correctly in general. The authors and readers
    are checking the rendered representation. It is possible that an equation
    would render correctly on screen, but be rendered to MathML inaccurately). </p>
</li>
</ul>
<p>So, while our publication environment does not result directly in lower level of semantic heterogeneity, it does provide the data and the tools to enable the computational agent to make this transformation. While this is imperfect, it should help a bit. In short, we provide a practical mechanism to identify text containing mathematics and a mechanism to transform this to a single, standardised representation. </p>
<h1 id="sec:repr-refer">4 Representing References</h1>
<p>Unlike mathematics, there is no standard mechanism for reference and
 in-text citation, but there are a large number of tools for authors such as
 BibTeX, Mendeley <span class="kcite" kcite-id="ITEM-2054-17">(<a href="http://www.mendeley.org">http://www.mendeley.org</a>)</span> or
 EndNote. As a result of this, the integration with existing toolsets is of
 primary importance, while the representation of the in-text citations is not,
 as it should be handled by the tool layer anyway. </p>
<p>Within kblog, we
 have developed a plugin called kcite <span class="kcite" kcite-id="ITEM-2054-18">(<a href="http://wordpress.org/extend/plugins/kcite/">http://wordpress.org/extend/plugins/kcite/</a>)</span>. For the
 author, citations are inserted using the
 syntax:<tt>[&zwj;cite]10.1371/journal.pone.0012258[&zwj;/cite]</tt>. The
 identifier used here is a DOI, or digital object identifier and, is widely
 used within the publishing and library industry. Currently, kcite supports
 DOIs minted by either CrossRef <span class="kcite" kcite-id="ITEM-2054-19">(<a href="http://www.crossref.org">http://www.crossref.org</a>)</span> or DataCite <span class="kcite" kcite-id="ITEM-2054-20">(<a href="http://www.datacite.org">http://www.datacite.org</a>)</span> (in practice, this means that we
 support the majority of DOIs). We also support identifiers from PubMed <span class="kcite" kcite-id="ITEM-2054-21">(<a href="http://www.pubmed.org">http://www.pubmed.org</a>)</span> which covers most biomedical
 publications and arXiv <span class="kcite" kcite-id="ITEM-2054-22">(<a href="http://www.arxiv.org">http://www.arxiv.org</a>)</span>, the
 physics (and other domains!) preprints archive, and we now have a system to
 support arbitrary URLs. Currently, authors are required to select the
 identifier where it is not a DOI. </p>
<p>We have picked this “shortcode”
 format for similar reasons as described for maths; it is relatively
 unambiguous, it is not XML based, so passes through the HTML generation layer
 of most authoring tools unchanged and is explicitly supported in WordPress,
 bypassing the need for regular expressions and later parsing. It would,
 however, be a little unwieldy from the perspective of the author. In
 practice, however, it is relatively easy to integrate this with many
 reference managers. For example, tools such as Zotero <span class="kcite" kcite-id="ITEM-2054-23">(<a href="http://www.zotero.org">http://www.zotero.org</a>)</span> and Mendeley use the Citation Style
 Language, and so can output kcite compliant citations with the following
 slightly elided code: </p>
<pre>
 &lt;citation&gt;
    &lt;layout prefix="[&zwj;cite]" suffix="[&zwj;/cite]"
         delimiter="[&zwj;/cite] [&zwj;cite]"&gt;
      &lt;text variable="DOI"/&gt;
    &lt;/layout&gt;
  &lt;/citation&gt;
</pre>
<p>We do not yet support L<sup style="font-variant:small-caps;
                                            margin-left:-0.3em">a</sup>T<sub style="text-transform:uppercase;
                                                                                    margin-left:-0.2em">e</sub>X/BibTeX
  citations, although we see no reason why a similar style file should not be
  supported (citations in this representation of the article were, rather painfully,
  converted by hand). We do, however, support BibTeX-formatted files: the first
  author’s preferred editing/citation environment is based around these with
  Emacs, RefTeX, and asciidoc. While this is undoubtedly a rather niche
  authoring environment, the (slightly elided) code for supporting this
  demonstrates the relative ease with which tool chains can be induced to
  support kcite: </p>
<pre>
(defadvice reftex-format-citation (around phil-asciidoc-around activate)
  (if phil-reftex-citation-override
      (setq ad-return-value (phil-reftex-format-citation entry format))
    ad-do-it))

(defun phil-reftex-format-citation( entry format )
  (let ((doi (reftex-get-bib-field "doi" entry)))
    (format "pass:[&zwj;[&zwj;cite source='doi'\\]%s[&zwj;/cite\\]]" doi)))
</pre>
<p>The key decision with kcite from the authorial perspective is to ignore the
  reference list itself and focus only on in-text citations, using public
  identifiers to references. This simplifies the tool integration process
  enormously, as this is the only data that needs to pass from the author’s
  bibliographic database onward. The key advantage for authors here is
  two-fold: they are not required to populate their reference metadata for
  themselves, and this metadata will update if it changes. Secondly, the
  identifiers are checked; if they are wrong, the authors will see this
  straightforwardly as the entire reference will be wrong. Adding DOIs or
  other identifiers moves from becoming a burden for the author to becoming a
  specific advantage. </p>
<p>While supporting multiple forms of reference
  identifier (CrossRef DOI, DataCite DOI, arXiv and PubMed ID) provides a
  clear advantage to the author, it comes at considerable cost. While it is
  possible to get metadata about papers from all of these sources, there is
  little commonality between them. Moreover, resolving this metadata requires
  one outgoing HTTP request per reference (in practice, it is often more; DOI
  requests, for instance use <tt>303</tt> redirects), which browser security
  might or might not allow. </p>
<p>So, while the presentation of mathematics
  is performed largely on the client, for reference lists the kcite plugin
  performs metadata resolution and data integration on the server. A caching
  functionality is provided, storing this metadata in the WordPress database.
  The bibliographic metadata is finally transferred to the client encoded as
  JSON, using asynchronous call-backs to the server. </p>
<p>Finally, this JSON
  is rendered using the citeproc-js library on the client. In our experience,
  this performs well, adding to the readers’ experience; in-text citations are
  initially shown as hyperlinks; rendering is rapid, even on aging hardware,
  and finally in-text citations are linked both to the bibliography and
  directly through to the external source. Currently, the format of the
  reference list is fixed, however, citeproc-js is a generalised reference
  processor, driven using CSL <span class="kcite" kcite-id="ITEM-2054-24">(<a href="http://citationstyles.org/">http://citationstyles.org/</a>)</span>. This makes it
  straight-forward to change citation format, at the option of the reader,
  rather than the author or publisher. Both the in-text citation and
  bibliography support outgoing links direct to the underlying
  resources (where the identifier allows &#8212; PubMed IDs redirect to PubMed). As these links have
  been used to gather metadata, they are likely to be correct. While these
  advantages are relatively small currently, we believe that the use of
  JavaScript rendering over a linked references can be used to add further
  reader value in future. </p>
<p>For the computational agent wishing to
  consume bibliographic information, we have added significant value compared
  to the pre-formatted HTML reference list. First, all the information
  required to render the citation is present in the in-text citation next to
  the text that the authors intended. A computational agent can, therefore,
  ignore the bibliography list itself entirely. These primary identifiers are,
  again, likely to be correct because the authors now need them to be correct
  for their own benefit. </p>
<p>Should the computational agent wish, the
  (denormalised) bibliographic data used to render the bibliography is
  actually available, present in the underlying HTML as a JSON string. This is
  represented in a homogeneous format, although, of course, represents our
  (kcite’s) interpretation of the primary data. </p>
<p>A final, and subtle,
  advantage of kcite is that the authors can only use public metadata, and not
  their own. If they use the correct primary identifier, and still get an
  incorrect reference, it follows that the public metadata must be
  incorrect (or, we acknowledge, that kcite is broken!). Authors and readers
  therefore must ask the metadata providers to fix their metadata to the
  benefit of all. This form of data linking, therefore, can even help those
  who are not using it. </p>
<h2 id="sec:microarray-data">4.1 Microarray
  Data</h2>
<p>Many publications require that papers discussing microarray experiments
  lodge their data in a publically available resource such as ArrayExpress
  <span class="kcite" kcite-id="ITEM-2054-25">(<a href="http://dx.doi.org/10.1093/nar/gkg091">http://dx.doi.org/10.1093/nar/gkg091</a>)</span>. Authors do this placing an ArrayExpress
  identifier which has the form <tt>E-MEXP-1551</tt>. Currently, adding this
  identifier to a publication, as with adding the raw data to the repository
  is no direct advantage to the author, other than fulfilment of the
  publication requirement. Similarly, there is no existing support within most
  authoring environments for adding this form of reference. </p>
<p>For the
  knowledgeblog-arrayexpress plugin <span class="kcite" kcite-id="ITEM-2054-26">(<a href="http://knowledgeblog.org/knowledgeblog-arrayexpress">http://knowledgeblog.org/knowledgeblog-arrayexpress</a>)</span>,
  therefore, we have again used a shortcode representation, but allowed the
  author to automatically fill metadata, direct from ArrayExpress. So a tag
  such as:<tt>[&zwj;aexp id="E-MEXP-1551"]species[&zwj;/aexp]</tt>
  will be replaced with <i class="itshape">Saccharomyces cerevisiae</i>,
  while:<tt>[&zwj;aexp
  id="E-MEXP-1551"]releasedate[&zwj;/aexp]</tt> will be replaced by
  “2010-02-24”. While the advantage here is small, it is significant.
  Hyperlinks to ArrayExpress are automatic, authors no longer need to look up
  detailed metadata. For metadata which authors are likely to know anyway
  (such as Species), the automatic lookup operates as a check that their
  ArrayExpress ID is correct. As with references (see Section <a></a>), the
  use of an identifier becomes an advantage rather than a burden to the
  authors. </p>
<p>Currently, for the reader there is less significant
  advantage at the moment. While there is some value to the author of the
  added correctness stemming from the ArrayExpress identifier. However,
  knowledgeblog-arrayexpress is currently under-developed, and the added
  semantics that is now present could be used more extensively. The
  unambiguous knowledge that:<tt>[&zwj;aexp
  id="E-MEXP-1551"]species[&zwj;/aexp]</tt> represents a species would
  allow us, for example, to link to the NCBI taxonomy database <span class="kcite" kcite-id="ITEM-2054-27">(<a href="http://www.ncbi.nlm.nih.gov/Taxonomy/">http://www.ncbi.nlm.nih.gov/Taxonomy/</a>)</span>.</p>
<p>Likewise,
  advantage for the computational agent from knowledgeblog­-array­express is
  currently limited; the identifiers are clearly marked up, and as the authors
  now care about them, they are likely to be correct. Again, however,
  knowledgeblog­-array­express is currently under developed for the
  computational agent. The knowledge that is extracted from ArrayExpress could
  be presented within the HTML generated by knowledgeblog­-array­express,
  whether or not it is displayed to the reader for, essentially no cost. By
  having an underlying shortcode representation, if we choose to add this
  functionality to knowledgeblog­-array­express, any posts written using it
  would automatically update their HTML. For the text-mining bioinformatician,
  even the ability to unambiguously determine that a paper described or used a
  data set relating to a specific species using standardised nomenclature (the
  standard nomenclature was only invented in 1753 and is still not used
  universally) would be a considerable boon. </p>
<h1 id="disc">5 Discussion</h1>
<p>Our approach to semantic enrichment of articles is a measured and
  evolutionary approach. We are investigating how we can increase the amount
  of knowledge in academic articles presented in a computationally accessible
  form. However, we are doing so in an environment which does not require all
  the different aspects of authoring and publishing to be over-turned. More
  over, we have followed a strong principle of semantic enhancement which
  offers advantages to both reader and author immediately. So, adding
  references as a DOI, or other identifier, ‘automagically’ produces an in
  text citation and a nicely formatted reference list: that the reference list
  is no longer present in the article, but is a visualisation over linked
  data; that the article itself has become a first class citizen of this
  linked data environment is a happy by-product. </p>
<p>This approach,
  however, also has disadvantages. There are a number of semantic enhancements
  which we could make straight-forwardly to the knowledgeblog environment that
  we have not; the principles that we have adopted requires significant
  compromise. We offer here two examples. </p>
<p>First, there has been
  significant work by others on CiTO <span class="kcite" kcite-id="ITEM-2054-28">(<a href="http://dx.doi.org/10.1186/2041-1480-1-S1-S6">http://dx.doi.org/10.1186/2041-1480-1-S1-S6</a>)</span> –
  an ontology which helps to describe the relationship between the citations
  and a paper. Kcite lays the ground-work for an easy and straight-forward
  addition of CiTO tags surrounding each in-text citation. Doing so, would
  enable increased machine understandability of a reference list. Potentially,
  we could use this to the advantage to the reader also: we could distinguish
  between reviews and primary research papers; highlight the authors’ previous
  work; emphasise older papers which are being refuted. However, to do this
  requires additional semantics from the author. Although these CiTO semantic
  enhancements would be easy to insert directly using the shortcode syntax,
  most authors will want to use their existing reference manager which will
  not support this form of semantics; even if it does, the author themselves
  gain little advantage from adding these semantics. There are advantages for
  the reader, but in this case not for both author and reader. As a result, we
  will probably add such support to kcite; but, if we are honest, find it
  unlikely that when acting as content authors, we will find the time to add
  this additional semantics. </p>
<p>Second, our presentation of mathematics
  could be modified to automatically generate MathML from any included
  T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X markup.
  The transformation could be performed on the server, using MathJax; MathML
  would still be rendered on the client to webfonts. This would mean that any
  embedded maths would be discoverable because of the existence of MathML,
  which is a considerable advantage. However, neither the reader nor the
  author gain any advantage from doing this, while paying the cost of the
  slower load times and higher server load that would result from running
  JavaScript on the server. More over, they would pay this cost regardless of
  whether their content were actually being consumed computationally. As the
  situation now stands, the computational user needs to identify the insert of
  MathJax into the web page, and then transform the page using this library,
  none of which is standard. This is clearly a serious compromise, but we feel
  a necessary one. </p>
<p>Our support for microarrays offers the possibility
  of the most specific and increased level of semantics of all of our plugins.
  Knowledge about a species or a microarray experimental design can be
  precisely represented. However, almost by definition, this form of knowledge
  is fairly niche and only likely to be of relevance to a small community.
  However, we do note that the knowledgeblog process based around commodity
  technology does offer a publishing process that can be adapted, extended and
  specialised in this way relatively easily. Ultimately the many small
  communities that make up the long-tail of scientific publishing adds up to
  one large one. </p>
<h1 id="a0000000021">6 Conclusion</h1>
<p>Semantic publishing is a desirable goal, but goals need to be realistic and
  achievable. to move towards semantic publishing in kblog, we have tried to
  put in place an approach that gives benefit to readers, authors and
  computational interpretation. As a result, at this stage, we have light
  semantic publishing, but with small, but definite benefits for
  all. </p>
<p>Semantics give meaning to entities. In kblog, we have sought
  benefit by “saying” within the kblog environment that entity <em>x</em> is either <b class="bfseries">maths</b>, a <b class="bfseries">citation</b> or a <b class="bfseries">microarray</b> data entity reference. This is sufficient for the kbloginfra-structure to &#8220;know what to do&#8221; with the entity in question. Knowing that some publishable entity is a &#8220;lump&#8221; of maths tells the infra-structure how to handle that entity: the reader has benefit from it looking like maths; the author has benefit by not having to do very much; and the infra-structure knows what to do. In addition, this approach leaves in hooks for doing more later. </p>
<p>It is not necessarily easy to find compelling examples that give advantages for all steps. Adding in CiTO attributes to citations, for instance, has obvious advantages for the reader, but not the author. However, advantages may be indirect; richer reader semantics may give more readers and thus more citations—the thing authors appreciate as much as the act of publishing itself. It is, however, difficult to imagine how such advantages can be conveyed to the author at the point of writing. It is easy to see the advantages of semantic publishing for readers, as a community we need to pay attention to advantages to the authors. Without these &#8220;carrots&#8221;, we will only have &#8220;sticks&#8221; and authors, particularly technically skilled ones, are highly adept at working around sticks. </p>
<div>
</div>

<p>Bibliography</p>
<div class="kcite-bibliography"></div>
<script type="text/javascript">var citeproc_controls=false;
var blog_home_url="http://www.russet.org.uk/blog/";
</script>

</div> <!-- kcite-section 2054 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2012/04/three-steps-to-heaven/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Discovering the Registration Agency</title>
		<link>http://www.russet.org.uk/blog/2012/04/discovering-the-registration-agency/</link>
		<comments>http://www.russet.org.uk/blog/2012/04/discovering-the-registration-agency/#comments</comments>
		<pubDate>Thu, 05 Apr 2012 01:47:02 +0000</pubDate>
		<dc:creator>Phillip Lord</dc:creator>
				<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=2044</guid>
		<description><![CDATA[In my previous articles, I have talked about general problems with DOIs , about architectural issues with capturing metadata , and finally, about specific problem DOIs . I have also described part of the difficulty is that it is hard to determine the registration agency associated with a specific DOI&#8201;&#8212;&#8201;there are actually different kinds of [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="2044">
<!-- coins metadata inserted by kblog-metadata -->
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=kblog-metadata.php&amp;rft.title=Discovering+the+Registration+Agency&amp;rft.source=An+Exercise+in+Irrelevance&amp;rft.date=2012-04-05&amp;rft.identifier=http%3A%2F%2Fwww.russet.org.uk%2Fblog%2F2012%2F04%2Fdiscovering-the-registration-agency%2F&amp;rft.au=Phillip+Lord&amp;rft.format=text&amp;rft.language=English"></span><p><a name="preamble"></a> 
<p>In my previous articles, I have talked about general problems with DOIs <span class="kcite" kcite-id="ITEM-2044-0">(<a href="http://www.russet.org.uk/blog/2011/02/the-problem-with-dois/">http://www.russet.org.uk/blog/2011/02/the-problem-with-dois/</a>)</span>, about architectural issues with capturing metadata <span class="kcite" kcite-id="ITEM-2044-1">(<a href="http://www.russet.org.uk/blog/2012/03/dois-and-content-negotiation/">http://www.russet.org.uk/blog/2012/03/dois-and-content-negotiation/</a>)</span>, and finally, about specific problem DOIs <span class="kcite" kcite-id="ITEM-2044-2">(<a href="http://www.russet.org.uk/blog/2012/03/a-problem-doi">http://www.russet.org.uk/blog/2012/03/a-problem-doi</a>)</span>.</p>
<p>I have also described part of the difficulty is that it is hard to determine the registration agency associated with a specific DOI&#8201;&#8212;&#8201;there are actually different kinds of DOI and they respond in different ways.</p>
<p>I have, however, finally found a way to discover who is a responsible for a given DOI. One of my own papers <span class="kcite" kcite-id="ITEM-2044-3">(<a href="http://www.jbiomedsem.com/content/1/S1/S7">http://www.jbiomedsem.com/content/1/S1/S7</a>)</span> declares its DOI to be <tt>10.1186/2041-1480-1-S1-S7</tt>. Unfortunately, refering to this paper using Kcite <span class="kcite" kcite-id="ITEM-2044-4">(<a href="http://www.russet.org.uk/blog/2012/02/kcite-spreads-its-wings/">http://www.russet.org.uk/blog/2012/02/kcite-spreads-its-wings/</a>)</span> shows, at the time of writing, an error message  <span class="kcite" kcite-id="ITEM-2044-5">(<a href="http://dx.doi.org/10.1186/2041-1480-1-S1-S7">http://dx.doi.org/10.1186/2041-1480-1-S1-S7</a>)</span>, nor does the DOI resolve. As this may later be fixed, the error message looks like this:</p>
<pre style="padding:0.5em; color:gray;">The DOI you requested --

10.1186/2041-1480-1-S1-S7

-- cannot be found in the Handle System.

Possible reasons for the error are:

    the DOI has not been created
    the DOI is cited incorrectly in your source
    the DOI does not resolve due to a system problem</pre>
<p>On filling in the &#8220;this DOI does not work&#8221; form, the error page redirects to, the another URL at <a href="http://notfound.doi.org/DoiError/servlet">http://notfound.doi.org/DoiError/servlet</a>, which says:</p>
<pre style="padding:0.5em; color:gray;">The DOI and comments (if provided) have been logged by CrossRef and forwarded
to the publisher to correct the problem. Possible reasons for the error are:

    the DOI has been created but has not been registered by the publisher
    (this could be an error or it could be a timing issue and the DOI will be
    registered in the next few days)
    the DOI is cited incorrectly in the source
    the DOI does not resolve due to a system problem

Maintaining the integrity of DOIs is very important to CrossRef and we
appreciate your help.</pre>
<p>Which suggests to me clear this DOI (should have) been registered with CrossRef. Unfortunately, this only works with DOIs that do not resolve in the first place. Directly accessing the link returns &#8220;nothing to see here&#8221;.</p>
<p>A little bit of poking around, and I have discovered a few other problematic DOIs, all from the same &#8220;special issue&#8221;.</p>
<ul> 
<li> <span class="kcite" kcite-id="ITEM-2044-6">(<a href="http://dx.doi.org/10.1186/2041-1480-1-S1-S1">http://dx.doi.org/10.1186/2041-1480-1-S1-S1</a>)</span> </li>
<li> <span class="kcite" kcite-id="ITEM-2044-7">(<a href="http://dx.doi.org/10.1186/2041-1480-1-S1-S2">http://dx.doi.org/10.1186/2041-1480-1-S1-S2</a>)</span> </li>
<li> <span class="kcite" kcite-id="ITEM-2044-8">(<a href="http://dx.doi.org/10.1186/2041-1480-1-S1-S4">http://dx.doi.org/10.1186/2041-1480-1-S1-S4</a>)</span> </li>
<li> <span class="kcite" kcite-id="ITEM-2044-9">(<a href="http://dx.doi.org/10.1186/2041-1480-1-S1-S5">http://dx.doi.org/10.1186/2041-1480-1-S1-S5</a>)</span> </li>
</ul>
<p>I might take this personally, as this includes another paper of mine, although, strangely, two of the DOIs from the same issue do work which also includes one of mine.</p>
<ul> 
<li> <span class="kcite" kcite-id="ITEM-2044-10">(<a href="http://dx.doi.org/10.1186/2041-1480-1-S1-S3">http://dx.doi.org/10.1186/2041-1480-1-S1-S3</a>)</span> </li>
<li> <span class="kcite" kcite-id="ITEM-2044-11">(<a href="http://dx.doi.org/10.1186/2041-1480-1-S1-S6">http://dx.doi.org/10.1186/2041-1480-1-S1-S6</a>)</span> </li>
</ul>
<p>All of this demonstrates an advantage of our Kcite tool <span class="kcite" kcite-id="ITEM-2044-12">(<a href="http://knowledgeblog.org/kcite-plugin">http://knowledgeblog.org/kcite-plugin</a>)</span>. By actually using primary identifers as part of the authoring process, I have discovered five DOIs, several of which have been on my website for a long time, that are broken. Possibly, the Journal of Biomedical Semantics should take a leaf out of my book. From their web page:</p>
<pre style="padding:0.5em; color:gray;">Journal of Biomedical Semantics 2010, 1(Suppl 1):S7 doi:10.1186/2041-1480-1-S1-S7

The electronic version of this article is the complete one and can be found
online at: http://www.jbiomedsem.com/content/1/S1/S7</pre>
<p>At the time of writing, the DOI is not displayed as <a href="http://dx.doi.org/10.1186/2041-1480-1-S1-S7">http://dx.doi.org/10.1186/2041-1480-1-S1-S7</a>, although this is a hard requirement from CrossRefs display guidelines <span class="kcite" kcite-id="ITEM-2044-13">(<a href="http://www.crossref.org/02publishers/doi_display_guidelines.html">http://www.crossref.org/02publishers/doi_display_guidelines.html</a>)</span>. Ironically, CrossRef says &#8220;CrossRef DOIs should always be displayed as permanent URLs in the online environment.&#8221; This seems to miss the requirement that, in the online environment, DOIs should be hyperlinked, so the Journal of Biomedical Semantics cannot be faulted there. It is a shame, though. If they were hyperlinked by now a web crawler would have discovered the 404.</p>
<hr /> 
<h2><a name="_update_14_05_2012"></a>Update (14/05/2012)</h2>
<p>Since I wrote this post, three out of four of the errorneous DOIs that I reported have been fixed. One of them (<a href="http://dx.doi.org/10.1186/2041-1480-1-S1-S2">http://dx.doi.org/10.1186/2041-1480-1-S1-S2</a>) is still broken. As I am using kcite to refer to these DOIs, the references have now automatically updated.</p>

<p>Bibliography</p>
<div class="kcite-bibliography"></div>
<script type="text/javascript">var citeproc_controls=false;
var blog_home_url="http://www.russet.org.uk/blog/";
</script>

</div> <!-- kcite-section 2044 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2012/04/discovering-the-registration-agency/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Semantics-Free Ontologies</title>
		<link>http://www.russet.org.uk/blog/2012/04/semantics-free-ontologies-2/</link>
		<comments>http://www.russet.org.uk/blog/2012/04/semantics-free-ontologies-2/#comments</comments>
		<pubDate>Wed, 04 Apr 2012 22:03:15 +0000</pubDate>
		<dc:creator>Phillip Lord</dc:creator>
				<category><![CDATA[Ontology]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=2040</guid>
		<description><![CDATA[In this article, I consider the problems of semantics-free identifiers in OWL and suggest another (possible) solution to the problem. The problems of identifiers and their semantics are not new. I have written about these problems previously in the context of: blog permalinks ; and with conversion between OBO format and Manchester syntax . The [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="2040">
<!-- coins metadata inserted by kblog-metadata -->
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=kblog-metadata.php&amp;rft.title=Semantics-Free+Ontologies&amp;rft.source=An+Exercise+in+Irrelevance&amp;rft.date=2012-04-05&amp;rft.identifier=http%3A%2F%2Fwww.russet.org.uk%2Fblog%2F2012%2F04%2Fsemantics-free-ontologies-2%2F&amp;rft.au=Phillip+Lord&amp;rft.format=text&amp;rft.language=English"></span><p><a name="preamble"></a> 
<p>In this article, I consider the problems of semantics-free identifiers in OWL and suggest another (possible) solution to the problem.</p>
<p>The problems of identifiers and their semantics are not new. I have written about these problems previously in the context of: blog permalinks <span class="kcite" kcite-id="ITEM-2040-0">(<a href="http://www.russet.org.uk/blog/2011/05/permalink-semantics/">http://www.russet.org.uk/blog/2011/05/permalink-semantics/</a>)</span>; and with conversion between OBO format and Manchester syntax <span class="kcite" kcite-id="ITEM-2040-1">(<a href="http://www.russet.org.uk/blog/2009/09/obo-format-and-manchester-syntax/">http://www.russet.org.uk/blog/2009/09/obo-format-and-manchester-syntax/</a>)</span>. The basic issue is one of choosing your compromise. Identifiers with semantics in them (which this blog uses although I wish it did not) are considerably more human readable, but are not resiliant to change, as the semantics in the identifiers can become out of date with respect to the content they describe. But neither compromise is entirely satisfactory; we need a more pragmatic approach <span class="kcite" kcite-id="ITEM-2040-2">(<a href="http://robertdavidstevens.wordpress.com/2011/05/26/unicorns-in-my-ontology">http://robertdavidstevens.wordpress.com/2011/05/26/unicorns-in-my-ontology</a>)</span>.</p>
<p>Recently, I was looking at the move of the OBI ontology <span class="kcite" kcite-id="ITEM-2040-3">(<a href="http://dx.doi.org/10.1186/2041-1480-1-S1-S7">http://dx.doi.org/10.1186/2041-1480-1-S1-S7</a>)</span> from BFO 1.0 to BFO 2.0. I have commented extensively on BFO before <span class="kcite" kcite-id="ITEM-2040-4">(<a href="http://dx.doi.org/10.1371/journal.pone.0012258">http://dx.doi.org/10.1371/journal.pone.0012258</a>)</span>, <span class="kcite" kcite-id="ITEM-2040-5">(<a href="http://www.russet.org.uk/blog/2010/07/realism-and-science/">http://www.russet.org.uk/blog/2010/07/realism-and-science/</a>)</span> <span class="kcite" kcite-id="ITEM-2040-6">(<a href="http://www.russet.org.uk/blog/2010/09/the-status-quo-farewell-tour-on-realism/">http://www.russet.org.uk/blog/2010/09/the-status-quo-farewell-tour-on-realism/</a>)</span>, and I was interested in what changes have been made for BFO 2.0.</p>
<p>Unfortunately, it is not that easy to work out. While diffs have never been the most human readable of output, the OBI diffs raise this to a new level Consider this change:</p>
<pre style="padding:0.5em; color:gray;">svn diff -r 3424:3425 https://obi.svn.sourceforge.net/svnroot/obi/trunk/src/ontology/branches/obi.owl

@@ -204,7 +197,7 @@
     &lt;owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/OBI_0000107"&gt;
         &lt;rdfs:label&gt;provides_service_consumer_with&lt;/rdfs:label&gt;
         &lt;rdfs:domain rdf:resource="http://purl.obolibrary.org/obo/OBI_0001173"/&gt;
-        &lt;rdfs:subPropertyOf rdf:resource="http://www.obofoundry.org/ro/ro.owl#has_part"/&gt;
+        &lt;rdfs:subPropertyOf rdf:resource="http://purl.obolibrary.org/obo/BFO_0000051"/&gt;
     &lt;/owl:ObjectProperty&gt;</pre>
<p>Also available <a href="http://obi.svn.sourceforge.net/viewvc/obi/trunk/src/ontology/branches/obi.owl?r1=3425&amp;r2=3424&amp;pathrev=3425">here</a> for those without access to a local subversion. The resource previously known as <tt>has_part</tt> has become the rather more obscure <tt>BFO_OOOOO51</tt>. In short, BFO has become semantics-free.</p>
<p>In general, I think that this is a good thing. The use of semantics in the identifiers for this blog is generally not helpful, although I have never carried through my year-old threat <span class="kcite" kcite-id="ITEM-2040-0">(<a href="http://www.russet.org.uk/blog/2011/05/permalink-semantics/">http://www.russet.org.uk/blog/2011/05/permalink-semantics/</a>)</span> to change the identifier scheme as I am not sure older links will be maintained. But the total unreadability of the OBI diff demonstrates a problem. One answer is that we should not be reading OWL source in the first place, but using tools. These tools exist <span class="kcite" kcite-id="ITEM-2040-7">(<a href="http://www.ebi.ac.uk/efo/bubastis/">http://www.ebi.ac.uk/efo/bubastis/</a>)</span>, in fact, but they are not a replacement for a diff, but a supplement to it. Source code must be in a readable syntax because line-orientated syntax is the lowest common denominator; semantic diffs are nice, but next we would need an OWL aware versioning tool, as versioning depends on diffing. Then OWL aware regexp search and replace tools for when syntactic alterations were needed. Eventually, we would end up replacing an entire software stack and, no doubt, doing it badly, since tools such as versioning software have a long heritage and are now very functional (and incredibly complex!).</p>
<p>My previous, minimal suggestion was to use a denormalisation, by adding a new comment character. So</p>
<pre style="padding:0.5em; color:gray;">ObjectProperty http://purl.obolibrary.org/obo/BFO_0000051</pre>
<p>would become</p>
<pre style="padding:0.5em; color:gray;">ObjectProperty http://purl.obolibrary.org/obo/BFO_0000051[has_part]</pre>
<p>The denormalisation here&#8201;&#8212;&#8201;presenting the same information as an opaque string and as a text string, fulfils both requirements. However it would require significant effort to keep the two in sync.</p>
<p>My new idea would be to use a similar idea to a Colour Lookup Table <span class="kcite" kcite-id="ITEM-2040-8">(<a href="http://en.wikipedia.org/wiki/Colour_look-up_table">http://en.wikipedia.org/wiki/Colour_look-up_table</a>)</span>. These are used to define a palette of colours selected from a much larger colour space. We could use a similar approach here. Essentially the idea is to put semantics free IDs at the top of the file, then meaningful ones in the middle. The idea is also similar to the use of abbreviations for namespaces in XML; for instance,</p>
<pre style="padding:0.5em; color:gray;">&lt;owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/OBI_0000107"&gt;</pre>
<p>the <tt>rdf:</tt> prefix actually refers to &#8220;http://www.w3.org/1999/02/22-rdf-syntax-ns#&#8221;. The letters <tt>rdf</tt> could be replaced by anything at all, so long as we update the namespace declaration without changing semantics.</p>
<p>In Manchester syntax, we could address this with an addition of an alias keyword. So:</p>
<pre style="padding:0.5em; color:gray;">ObjectProperty http://purl.obolibrary.org/obo/OBI_0000107
   Annotations: rdfs:label="provides_service_consumer_with"
   Domain: http://purl.obolibrary.org/obo/OBI_0001173
   SubPropertyOf: http://purl.obolibrary.org/obo/BFO_0000051</pre>
<p>would become</p>
<pre style="padding:0.5em; color:gray;">Prefix: obo: http://purl.obolibrary.org/obo/
Alias: obo:OBI_0000107 "provides_service_consumer_with"
Alias: obo:OBI_0001173 "service"
Alias: obo:BFO_0000051 "has_part"

ObjectProperty provides_service_consumer_with
   Annotations: rdfs:label="provides_service_consumer_with"
   Domain: service
   SubPropertyOf: has_part</pre>
<p>In this case, because we are defining a term and attaching a label we get the same string twice, but there is no formal link between the two. With this system in place, moving the identifiers for BFO would have required an update to only the Alias table at the top. Now an obvious place for the strings to come from would be the source ontology (so &#8220;has_part&#8221; would come from RO <span class="kcite" kcite-id="ITEM-2040-9">(<a href="http://dx.doi.org/10.1186/gb-2005-6-5-r46">http://dx.doi.org/10.1186/gb-2005-6-5-r46</a>)</span>, or now BFO); this would, in fact, serve as a useful check. If I reference an external ontology and it&#8217;s labels do not match with my Alias definitions, I may wish to check to see whether the concepts I have imported still have the semantics that I intended.</p>
<p>The same approach could be directly translated into the XML representation without change, I believe, with the use of XML entities which are defined at the start of an XML document. Of course, this is entirely horrible, and changing the OWL schema would make more sense. Extending Manchester syntax is straight-forward as I think I have shown here. Likewise, for OBO format. And the practical upshot would be a significant increase in the readability of many ontologies without eschewing the good practice of semantics free identifiers.</p>

<p>Bibliography</p>
<div class="kcite-bibliography"></div>
<script type="text/javascript">var citeproc_controls=false;
var blog_home_url="http://www.russet.org.uk/blog/";
</script>

</div> <!-- kcite-section 2040 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2012/04/semantics-free-ontologies-2/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Kblog Metadata</title>
		<link>http://www.russet.org.uk/blog/2012/03/kblog-metadata/</link>
		<comments>http://www.russet.org.uk/blog/2012/03/kblog-metadata/#comments</comments>
		<pubDate>Sat, 31 Mar 2012 21:44:22 +0000</pubDate>
		<dc:creator>Phillip Lord</dc:creator>
				<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=2025</guid>
		<description><![CDATA[Previously, I described the additions that we have made to the kcite plugin , which now supports multiple different types of identifiers. This includes the subset of DOIs that come from either CrossRef or DataCite , arXiv or Pubmed . However, rather embarrasingly, one of the identifiers that we do not support well are URLs. [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="2025">
<!-- coins metadata inserted by kblog-metadata -->
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=kblog-metadata.php&amp;rft.title=Kblog+Metadata&amp;rft.source=An+Exercise+in+Irrelevance&amp;rft.date=2012-03-31&amp;rft.identifier=http%3A%2F%2Fwww.russet.org.uk%2Fblog%2F2012%2F03%2Fkblog-metadata%2F&amp;rft.au=Phillip+Lord&amp;rft.format=text&amp;rft.language=English"></span><p><a name="preamble"></a> 
<p>Previously, I described the additions that we have made to the kcite plugin <span class="kcite" kcite-id="ITEM-2025-0">(<a href="http://www.russet.org.uk/blog/2012/02/kcite-spreads-its-wings/">http://www.russet.org.uk/blog/2012/02/kcite-spreads-its-wings/</a>)</span>, which now supports multiple different types of identifiers. This includes the subset of DOIs <span class="kcite" kcite-id="ITEM-2025-1">(<a href="http://www.russet.org.uk/blog/2011/02/the-problem-with-dois/">http://www.russet.org.uk/blog/2011/02/the-problem-with-dois/</a>)</span> that come from either CrossRef <span class="kcite" kcite-id="ITEM-2025-2">(<a href="http://www.crossref.org">http://www.crossref.org</a>)</span> or DataCite <span class="kcite" kcite-id="ITEM-2025-3">(<a href="http://www.datacite.org">http://www.datacite.org</a>)</span>, arXiv <span class="kcite" kcite-id="ITEM-2025-4">(<a href="http://arxiv.org/">http://arxiv.org/</a>)</span> or Pubmed <span class="kcite" kcite-id="ITEM-2025-5">(<a href="http://www.ncbi.nlm.nih.gov/pubmed/">http://www.ncbi.nlm.nih.gov/pubmed/</a>)</span>. However, rather embarrasingly, one of the identifiers that we do not support well are URLs. Slightly ironic as one of the purposes behind <span class="kcite" kcite-id="ITEM-2025-6">(<a href="http://www.russet.org.uk/blog/2011/06/ontogenesis-knowledgeblog-lightweight-semantic-publishing/">http://www.russet.org.uk/blog/2011/06/ontogenesis-knowledgeblog-lightweight-semantic-publishing/</a>)</span> is to demonstrate that it is possible to replicate the publication experience using the web.</p>
<p>The main reason for this is the lack of an active source of metadata. The various identifiers that we have supported all come with a standardised source of metadata, which is not so straightforward with a generic URL. This is one of the reasons for my new plugin, kblog-metadata (<a href="http://wordpress.org/extend/plugins/kblog-metadata/">http://wordpress.org/extend/plugins/kblog-metadata/</a>). This currently consists of three pieces of functionality: kblog-headers, kblog-authors and kblog-table-of-contents.</p>
<p>For a long time now, I have added COinS metadata <span class="kcite" kcite-id="ITEM-2025-7">(<a href="http://ocoins.info/">http://ocoins.info/</a>)</span> to both this blog and kblog <span class="kcite" kcite-id="ITEM-2025-8">(<a href="http://www.knowledgeblog.org/">http://www.knowledgeblog.org/</a>)</span>. But, from my perspective, COinS is a dreadful specification. It involves embedding a NISO 1.0 Context Object <span class="kcite" kcite-id="ITEM-2025-9">(<a href="http://www.niso.org/standards/standard_detail.cfm?std_id=783">http://www.niso.org/standards/standard_detail.cfm?std_id=783</a>)</span> into a span tag. The reference here is from the COinS specification <span class="kcite" kcite-id="ITEM-2025-7">(<a href="http://ocoins.info/">http://ocoins.info/</a>)</span>, but is, unfortunately 404 at the time of writing. It uses a URL encoded query string&#8201;&#8212;&#8201;in short, a microsyntax inside HTML which needs it&#8217;s own independent parsing. Key strings are confusing at best (<tt>rft_val_fmt</tt> and <tt>rft.auinit</tt> for example&#8201;&#8212;&#8201;why both underscores and dots?). And there is a degree of randomness about things: first authors can be split into first name, last name, initials, while subsequent authors cannot. More over, I could not find a processor to test whether my COinS implementation was actually correct. I wanted something that was a bit easier, and also in wider use. So, while we still use COinS metadata, we have now also added <tt>meta</tt> tags as recommended by Google Scholar <span class="kcite" kcite-id="ITEM-2025-10">(<a href="http://scholar.google.com/intl/en/scholar/inclusion.html">http://scholar.google.com/intl/en/scholar/inclusion.html</a>)</span>; ironically, on a page with, as far as I can see, no <tt>meta</tt> tags at all. Finally, we also have Open Graph Protocol <span class="kcite" kcite-id="ITEM-2025-11">(<a href="http://ogp.me/">http://ogp.me/</a>)</span>. Fortunately their website does use their own advice. Kblog-headers includes all of these formats, as can now be seen on this page.</p>
<p>Since the inception of Kblog, one of the difficulties we have had is with multiple authors. When adding metadata, for instance, we need to ensure that all the authors are represented. We have used plugins such as co-authors-plus <span class="kcite" kcite-id="ITEM-2025-12">(<a href="http://wordpress.org/extend/plugins/co-authors-plus/">http://wordpress.org/extend/plugins/co-authors-plus/</a>)</span> to enable multi-author work. However, these plugins come with a lot of extra baggage, namely the requirement for all authors to have a WordPress login (either WordPress.com, or on the local installation). Essentially, aside from the first workshop <span class="kcite" kcite-id="ITEM-2025-13">(<a href="http://www.russet.org.uk/blog/2010/01/the-ontogenesis-tutorial/">http://www.russet.org.uk/blog/2010/01/the-ontogenesis-tutorial/</a>)</span>, we have never seen anyone collaboratively edit documents on WordPress. Where multiple authors have worked together (which we have seen a lot) they have done so using Word, LaTeX, Google docs or asciidoc, collaborating with DropBox or email. Only the communicating author needs an account. The problem was accentuated with sites like <a href="http://bio-ontologies.knowledgeblog.org/">Bio-Ontologies</a>, where all of the articles were <strong>posted</strong> by either myself or Simon Cockell <span class="kcite" kcite-id="ITEM-2025-14">(<a href="http://blog.fuzzierlogic.com/">http://blog.fuzzierlogic.com/</a>)</span>, but were <strong>authored</strong> by neither. From my perspective, we need the ability to separate these two roles&#8201;&#8212;&#8201;posting and authoring. Kblog-authors achieves precisely this. New authors can be added either using short codes within the document content, or through the WordPress edit page (the GUI is a little primitive, but functional). These authors do not need WordPress accounts, with the posting account being used if no authors are explicitly given.</p>
<p>Finally, I have rewritten kblog-table-of-contents, and am combining it with kblog-metadata. This provides a new shortcode &#91;kblogtoc&#93; which can be used to embed a table of contents showing all posts&#8201;&#8212;&#8201;ideal for searching over. For more computational use it is also possible to get a line separate text file (<a href="http://www.russet.org.uk/blog/?kblog-toc=txt">http://www.russet.org.uk/blog/?kblog-toc=txt</a>) or approximately the same thing as HTML (<a href="http://www.russet.org.uk/blog/?kblog-toc=html">http://www.russet.org.uk/blog/?kblog-toc=html</a>), which can be cut and paste without having to view source. A more readable and searchable form can be seen embedded in a normal WordPress page (<a href="http://www.russet.org.uk/blog/table-of-contents/">http://www.russet.org.uk/blog/table-of-contents/</a>). This has also enabled us to finally fix the Bio-Ontologies contents page (<a href="http://bio-ontologies.knowledgeblog.org/table-of-contents">http://bio-ontologies.knowledgeblog.org/table-of-contents</a>) which now shows the correct authors, with all of the posts advertising their authorship.</p>
<p>All three of these plugins require further work. At the moment, they provide better metadata, but they do not give the author and reader enough utility to encourage people to install them, of which more in the future. Despite this, however, I think they are already proving useful, and should help to solve a long standing problem that we have had within WordPress for an academic environment.</p>
<p><strong>Erratum</strong></p>
<p>2012-05-09: Corrected typographical error which meant the kblogtoc shortcode was displaying incorrectly.</p>

<p>Bibliography</p>
<div class="kcite-bibliography"></div>
<script type="text/javascript">var citeproc_controls=false;
var blog_home_url="http://www.russet.org.uk/blog/";
</script>

</div> <!-- kcite-section 2025 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2012/03/kblog-metadata/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A problem DOI</title>
		<link>http://www.russet.org.uk/blog/2012/03/a-problem-doi/</link>
		<comments>http://www.russet.org.uk/blog/2012/03/a-problem-doi/#comments</comments>
		<pubDate>Wed, 21 Mar 2012 14:25:09 +0000</pubDate>
		<dc:creator>Phillip Lord</dc:creator>
				<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=2012</guid>
		<description><![CDATA[In my previous article, I discussed my ongoing struggles with DOIs and their metadata . The article discussed the difficulties with implementing content negotiation for kcite ; in particular, getting metadata for a given DOI and understanding that metadata once it had been fetched. Here, I discuss these two issues again! Accessing the Metadata I [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="2012">
<!-- coins metadata inserted by kblog-metadata -->
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=kblog-metadata.php&amp;rft.title=A+problem+DOI&amp;rft.source=An+Exercise+in+Irrelevance&amp;rft.date=2012-03-21&amp;rft.identifier=http%3A%2F%2Fwww.russet.org.uk%2Fblog%2F2012%2F03%2Fa-problem-doi%2F&amp;rft.au=Phillip+Lord&amp;rft.format=text&amp;rft.language=English"></span><p><a name="preamble"></a> 
<p>In my previous article, I discussed my ongoing struggles with DOIs and their metadata <span class="kcite" kcite-id="ITEM-2012-0">(<a href="http://www.russet.org.uk/blog/2012/03/dois-and-content-negotiation/">http://www.russet.org.uk/blog/2012/03/dois-and-content-negotiation/</a>)</span>. The article discussed the difficulties with implementing content negotiation for kcite <span class="kcite" kcite-id="ITEM-2012-1">(<a href="http://knowledgeblog.org/kcite-plugin">http://knowledgeblog.org/kcite-plugin</a>)</span>; in particular, getting metadata for a given DOI and understanding that metadata once it had been fetched. Here, I discuss these two issues again!</p>
<hr /> 
<h2><a name="_accessing_the_metadata"></a>Accessing the Metadata</h2>
<p>I have previously described the implementation of Content Negotiation for DOIs in Kcite. In the examples given, I used <tt>libcurl</tt> which has the flexibility to perform content negotiation. One difficulty with this approch is that <tt>libcurl</tt> is not a standard requirement for WordPress at least on Ubuntu. So, adding this requirement, forces users to make an additional step when installing the plugin.</p>
<p>Following on from the previous article, I have decided to update all of the resolution mechanisms in Kcite to use <tt>libcurl</tt>, as it made little sense to have several different mechanisms in place. Ironically, in the process, I found that the direct use of <tt>libcurl</tt> was unnecessary anyway. The reason for this is that WordPress has its own HTTP API <span class="kcite" kcite-id="ITEM-2012-2">(<a href="http://codex.wordpress.org/HTTP_API">http://codex.wordpress.org/HTTP_API</a>)</span>, and it is possible to use this instead of <tt>libcurl</tt>. Underneath, it uses one of several transport mechanisms, including <tt>libcurl</tt> which is probably the fastest, but also pure PHP solutions which make life a bit easier.</p>
<p>Most of my criticisms of content negotiation, however, still remain. Unlike <tt>libcurl</tt> WordPress&#8217; API automatically follows <tt>303 See Also</tt> redirect messages which <a href="http://dx.doi.org">http://dx.doi.org</a> returns. However, content negotiation is not directly supported, so I still need to set the HTTP headers manually. The format of these did not appear to be documented, but I discovered them through hacking the WordPress core of my test installation. The core of my code for Kcite now looks as follows:</p>
<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10">
<tr>
<td><!-- Generator: GNU source-highlight 3.1.4 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite -->
<pre><tt>      $url = "http://dx.doi.org/{$cite-&gt;identifier}";

      $params = array(
                      'headers' =&gt;
                      array( 'Accept' =&gt;
                             "application/x-datacite+xml;q=0.9, application/citeproc+json;q=1.0"),
                      );

      $wpresponse = wp_remote_get( $url, $params );

      if( is_wp_error( $wpresponse ) ){
          return $cite;
      }

      $response = wp_remote_retrieve_body( $wpresponse );
      $status = wp_remote_retrieve_response_code( $wpresponse );
      $headers = wp_remote_retrieve_headers( $wpresponse );
      $contenttype = $headers["content-type"];

      // it's probably not a DOI at all. Need to check some more here.
      if( $status == 404 ){
          // kcite code
      }

      if( $contenttype == "application/citeproc+json" ){
          // crossref DOI
      }

      if( $contenttype == "application/x-datacite+xml" ){
          //datacite DOI
      }</tt></pre>
</td>
</tr>
</table>
<hr /> 
<h2><a name="_which_registration_agency"></a>Which registration agency</h2>
<p>Having updated Kcite to use the HTTP API for all of its metadata resolution methods, I thought it would be wise to check all of my test cases to see that they were working correctly. I found one DOI which was and, in fact, always has been behaving incorrectly, returning the wrong metadata.</p>
<p>The reason for this was a slightly dubious piece of logic in Kcite, put in place when we have less understanding of DOIs. Where we could not find metadata from CrossRef about a DOI, we were free text searching Pubmed for the DOI, and using Pubmed metadata instead. Unfortunately, the free text search of Pubmed is not fuzzy, and it was this that was resulting in erroneous metadata. I have now removed this code from Kcite which may result in some DOIs that appeared to work previously now failing.</p>
<p>Once again, we see the difficulty in being unable to determine the registration agency for a given DOI. I can only conclude that the DOI in question was not allocated by either CrossRef or DataCite, as neither returned metadata for this DOI; a conclusion on negative data, however, is not a strong one. It could (and has) also be the case that either CrossRef or DataCite metadata services are not working properly.</p>
<p>I tried investigating the DOI Registration Agency <span class="kcite" kcite-id="ITEM-2012-3">(<a href="http://www.doi.org/registration_agencies.html">http://www.doi.org/registration_agencies.html</a>)</span> page from the International DOI foundation (IDF). None of these seemed obvious candidates. It appears that the IDFs web page is incorrect.</p>
<p>The DOI in question? This is <a href="http://dx.doi.org/10.1000/182">http://dx.doi.org/10.1000/182</a>. This DOI is the identifier for the &#8220;DOI Handbook&#8221;. And the registration agency missing from the IDFs page. That would be the International DOI Foundation.</p>
<hr /> 
<h2><a name="_demonstration"></a>Demonstration</h2>
<p>At the time of posting, this blog is using a released version of Kcite, hence a link to the DOI <span class="kcite" kcite-id="ITEM-2012-4">(<a href="http://dx.doi.org/10.1000/182">http://dx.doi.org/10.1000/182</a>)</span> actually appears as a link to a Pubmed paper <span class="kcite" kcite-id="ITEM-2012-5">(<a href="http://www.ncbi.nlm.nih.gov/pubmed/21785971">http://www.ncbi.nlm.nih.gov/pubmed/21785971</a>)</span>. The current (unreleased) development version of Kcite now reports the absence of metadata for this DOI which this post will (or may already) display once I update.</p>
<p>The underlying HTML source, it should be noted, still contains the correct link (to the DOI handbook), so a computational agent consuming this page would still detect the authors original intention, even with the error from Kcite.</p>
<hr /> 
<h2><a name="_update"></a>Update</h2>
<p>Fixed broken links!</p>
<p>14/05/2012: My blog has been updated, so 10.1000/182 now shows up as metadata missing, which is correct behaviour from kcite.</p>

<p>Bibliography</p>
<div class="kcite-bibliography"></div>
<script type="text/javascript">var citeproc_controls=false;
var blog_home_url="http://www.russet.org.uk/blog/";
</script>

</div> <!-- kcite-section 2012 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2012/03/a-problem-doi/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>DOIs and Content Negotiation</title>
		<link>http://www.russet.org.uk/blog/2012/03/dois-and-content-negotiation/</link>
		<comments>http://www.russet.org.uk/blog/2012/03/dois-and-content-negotiation/#comments</comments>
		<pubDate>Sat, 10 Mar 2012 10:47:49 +0000</pubDate>
		<dc:creator>Phillip Lord</dc:creator>
				<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=2006</guid>
		<description><![CDATA[With the release of Kcite 1.5 , we now support multiple forms of citation . There have also been some changes to the implementation layer, however, that I will describe in this article. I have previously written critically about DOIs and their problems . One of my criticisms is the inability to access metadata about [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="2006">
<!-- coins metadata inserted by kblog-metadata -->
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=kblog-metadata.php&amp;rft.title=DOIs+and+Content+Negotiation&amp;rft.source=An+Exercise+in+Irrelevance&amp;rft.date=2012-03-10&amp;rft.identifier=http%3A%2F%2Fwww.russet.org.uk%2Fblog%2F2012%2F03%2Fdois-and-content-negotiation%2F&amp;rft.au=Phillip+Lord&amp;rft.format=text&amp;rft.language=English"></span><p><a name="preamble"></a> 
<p>With the release of Kcite 1.5 <span class="kcite" kcite-id="ITEM-2006-0">(<a href="http://wordpress.org/extend/plugins/kcite/">http://wordpress.org/extend/plugins/kcite/</a>)</span>, we now support multiple forms of citation <span class="kcite" kcite-id="ITEM-2006-1">(<a href="http://www.russet.org.uk/blog/2012/02/kcite-spreads-its-wings/">http://www.russet.org.uk/blog/2012/02/kcite-spreads-its-wings/</a>)</span>. There have also been some changes to the implementation layer, however, that I will describe in this article. I have previously written critically about DOIs and their problems <span class="kcite" kcite-id="ITEM-2006-2">(<a href="http://www.russet.org.uk/blog/2011/02/the-problem-with-dois/">http://www.russet.org.uk/blog/2011/02/the-problem-with-dois/</a>)</span>. One of my criticisms is the inability to access metadata about a DOI in a standardised way. In this article, I will consider the addition of content negotiation and whether this improves the situation. From this, I will draw a number of <a href="#conclusions">conclusions</a> about the DOI system.</p>
<hr /> 
<h2><a name="_background"></a>Background</h2>
<p>DOIs offer a single point of entry mechanism for refering to a paper. A DOI such as &#8220;10.1371/journal.pone.0012258&#8243; refers to one of my papers <span class="kcite" kcite-id="ITEM-2006-3">(<a href="http://dx.doi.org/10.1371/journal.pone.0012258">http://dx.doi.org/10.1371/journal.pone.0012258</a>)</span>. It can be transformed into a URL by the additional of <a href="http://dx.doi.org">http://dx.doi.org</a> to the front, giving <a href="http://dx.doi.org/10.1371/journal.pone.0012258">http://dx.doi.org/10.1371/journal.pone.0012258</a>. The DOI proxy service takes this URL and redirects the user to the &#8220;real&#8221; URL which contains the content in question. DOIs themselves are assigned by a registration agency. The majority of DOIs that refer to academic papers have been assigned by CrossRef <span class="kcite" kcite-id="ITEM-2006-4">(<a href="http://www.crossref.org">http://www.crossref.org</a>)</span>. However, they are not the only registration agency&#8201;&#8212;&#8201;DataCite provides a similar service for, intuitively enough, data sets <span class="kcite" kcite-id="ITEM-2006-5">(<a href="http://www.datacite.org">http://www.datacite.org</a>)</span>. The actual content&#8201;&#8212;&#8201;the papers or the data sets&#8201;&#8212;&#8201;are stored elsewhere. Both DataCite and CrossRef simply forward the user of these URLs to the publisher or data repository.</p>
<p>Our previous article <span class="kcite" kcite-id="ITEM-2006-2">(<a href="http://www.russet.org.uk/blog/2011/02/the-problem-with-dois/">http://www.russet.org.uk/blog/2011/02/the-problem-with-dois/</a>)</span>, discussed a number of problems including the difficulty in accessing metadata about a given DOI. As well as being an issue of general concern, it is also a specific problem for the development of Kcite <span class="kcite" kcite-id="ITEM-2006-6">(<a href="http://knowledgeblog.org/kcite-plugin">http://knowledgeblog.org/kcite-plugin</a>)</span>. This wordpress plugin generates reference lists from identifiers, including DOIs; it is active on this article. To do this, it captures metadata about each reference from a variety of different metadata servers.</p>
<p>CrossRef have recently announced the addition of Content Negotiation to their list of services <span class="kcite" kcite-id="ITEM-2006-7">(<a href="http://www.crossref.org/CrossTech/2011/04/content_negotiation_for_crossr.html">http://www.crossref.org/CrossTech/2011/04/content_negotiation_for_crossr.html</a>)</span>. This provides a mechanism to access metadata about a DOI, at least for those DOIs where CrossRef is the registration agency. This mechanism became more attractive with the announcement that it is now also supported by datacite <span class="kcite" kcite-id="ITEM-2006-8">(<a href="http://www.crossref.org/CrossTech/2011/10/datacite_supporting_content_ne.html">http://www.crossref.org/CrossTech/2011/10/datacite_supporting_content_ne.html</a>)</span>. Finally, partly following a request of mine, CrossRef also now releases its metadata in JSON <span class="kcite" kcite-id="ITEM-2006-9">(<a href="http://www.crossref.org/CrossTech/2011/11/turning_dois_into_formatted_ci.html">http://www.crossref.org/CrossTech/2011/11/turning_dois_into_formatted_ci.html</a>)</span> ready for Citeproc-js <span class="kcite" kcite-id="ITEM-2006-10">(<a href="http://bitbucket.org/fbennett/citeproc-js">http://bitbucket.org/fbennett/citeproc-js</a>)</span>. This format is used internally by Kcite, which required parsing from CrossRef unixref XML. Retrieving JSON directly had obvious advantages.</p>
<hr /> 
<h2><a name="_accessing_the_metadata"></a>Accessing the metadata</h2>
<p>Here, I describe the implementation of content negotiation for Kcite. The complete source of Kcite is available from Mercurial although not all of the changes described here were checked in <span class="kcite" kcite-id="ITEM-2006-11">(<a href="http://code.google.com/p/knowledgeblog/source/browse/trunk/plugins/kcite/kcite.php">http://code.google.com/p/knowledgeblog/source/browse/trunk/plugins/kcite/kcite.php</a>)</span>.</p>
<p>My original implementation for gathering CrossRef metadata used the <tt>file_get_contents</tt> method in PHP. Despite its name, this also works with URLs, providing a simple and straight-forward implementation path.</p>
<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10">
<tr>
<td><!-- Generator: GNU source-highlight 3.1.4 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite -->
<pre><tt>   $url = "http://www.crossref.org/openurl/?noredirect=true&amp;pid="
            .$crossref."&amp;format=unixref&amp;id=doi:".$cite-&gt;identifier;
   $xml = file_get_contents($url, 0);</tt></pre>
</td>
</tr>
</table>
<p>There are a number of issues with this implementation, not least the lack of any significant error handling. More over, the <tt>file_get_contents</tt> is not very adaptable; it performs a simple HTTP GET request. So, I decided to use PHP <tt>libcurl</tt> <span class="kcite" kcite-id="ITEM-2006-12">(<a href="http://php.net/manual/en/book.curl.php">http://php.net/manual/en/book.curl.php</a>)</span>. The translation from <tt>file_get_contents</tt> is reasonably straight-forward.</p>
<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10">
<tr>
<td><!-- Generator: GNU source-highlight 3.1.4 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite -->
<pre><tt>$url = "http://dx.doi.org/{$cite-&gt;identifier}";
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url );
$response = curl_exec ($ch);

curl_close($ch);</tt></pre>
</td>
</tr>
</table>
<p>Initially this failed to work. I normally build and test my code on Ubuntu and, unfortunately, PHP libcurl is not installed with either WordPress, PHP or Apache. A search and <tt>aptitude install</tt> solve this problem. Now strange things happen. It turns out that the default behaviour of <tt>libcurl</tt> is to embed the retrieved content into the output&#8201;&#8212;&#8201;that is the outgoing web page. So, I need to add an option to the <tt>libcurl</tt> calls.</p>
<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10">
<tr>
<td><!-- Generator: GNU source-highlight 3.1.4 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite -->
<pre><tt>      $url = "http://dx.doi.org/{$cite-&gt;identifier}";

      // get the metadata with negotiation
      $ch = curl_init();
      curl_setopt ($ch, CURLOPT_URL, $url );
      curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true );</tt></pre>
</td>
</tr>
</table>
<p>The code was still not working, and nothing appears to be returned though. Debugging a black box is never easy, so I need to get more information before going further. So, I added code to dump curl verbose information to a log file.</p>
<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10">
<tr>
<td><!-- Generator: GNU source-highlight 3.1.4 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite -->
<pre><tt>      // debug
      $fh = fopen('/tmp/curl.log', 'w');
      curl_setopt($ch, CURLOPT_STDERR, $fh );
      curl_setopt($ch, CURLOPT_VERBOSE, true );</tt></pre>
</td>
</tr>
</table>
<p>A quick perusal of the HTTP requests show the problem. By default, a call to <a href="http://dx.doi.org">http://dx.doi.org</a> returns a <tt>303 See Other</tt> response. By default, <tt>libcurl</tt> does not follow this. Another command line option is required to fix this.</p>
<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10">
<tr>
<td><!-- Generator: GNU source-highlight 3.1.4 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite -->
<pre><tt>      $ch = curl_init();
      curl_setopt ($ch, CURLOPT_URL, $url );
      curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true );
      curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, true );</tt></pre>
</td>
</tr>
</table>
<p>Finally, we need to use content negotiation. The PHP <tt>libcurl</tt> library does not support this directly, so we need to set the HTTP headers for ourselves.</p>
<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10">
<tr>
<td><!-- Generator: GNU source-highlight 3.1.4 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite -->
<pre><tt>        curl_setopt ($ch, CURLOPT_HTTPHEADER,
                   array (
                          "Accept: application/citeproc+json;q=1.0"
                           ));</tt></pre>
</td>
</tr>
</table>
<p>And I now have a solution. Kcite needed reworking, but mostly this involved removing the XML parsing layer, all was looking good. Except that while looking through my regression tests, I found that DataCite support has been broken. I was, at that time, accessing DataCite using a different interface.</p>
<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10">
<tr>
<td><!-- Generator: GNU source-highlight 3.1.4 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite -->
<pre><tt>    $url = "http://data.datacite.org/application/x-datacite+xml/"
         . $cite-&gt;identifier;</tt></pre>
</td>
</tr>
</table>
<p>The difficulty was that previously I was accessing CrossRef directly to resolve DOIs. Asking CrossRef about a Datacite DOI resulted in an unknown DOI response. Kcite resonded to this response by trying DataCite next; unfortunately, there is no way that I know of to distinguish syntactically a DataCite and CrossRef DOI. With the new method, the content negotiated call to <a href="http://dx.doi.org">http://dx.doi.org</a> succeeds, although DataCite does not know of the requested <tt>citeproc+json</tt> MIME type, so returns HTML. So, again, we need to extend the our DOI resolution, checking for the returned content type.</p>
<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10">
<tr>
<td><!-- Generator: GNU source-highlight 3.1.4 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite -->
<pre><tt>      $response = curl_exec ($ch);
      $status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
      $contenttype = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);

      // it's probably not a DOI at all. Need to check some more here.
      if( curl_errno( $ch ) == 404 ){
          curl_close($ch);
          return $cite;
      }

      curl_close ($ch);

      if( $contenttype == "application/citeproc+json" ){
          // crossref DOI
          //kcite specific logic follows.
      }</tt></pre>
</td>
</tr>
</table>
<p>I should now be able to achieve a single call to resolve a DOI by modifying the headers once again. Here we request <tt>citeproc+json</tt> if possible or <tt>x-datacite+xml</tt> if it is not.</p>
<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10">
<tr>
<td><!-- Generator: GNU source-highlight 3.1.4 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite -->
<pre><tt>    curl_setopt ($ch, CURLOPT_HTTPHEADER,
               array (
                     Accept: application/citeproc+json;q=1.0, application/x-datacite+xml;q=0.9
                          ));
</tt></pre>
</td>
</tr>
</table>
<p>Unfortunately this fails also. While CrossRef returns <tt>citeproc+json</tt>, DataCite still returns HTML. Discussions with Karl Ward from CrossRef cleared up the problem. The content negotiation implementation of both CrossRef and DataCite was imperfect. DataCite&#8217;s implementation always tried to return the first content type; but it doesn&#8217;t know about <tt>citeproc+json</tt>, hence the HTML. Meanwhile CrossRef returns only the highest q value, rather than all types. Ironically, the problem was solved by doing this:</p>
<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10">
<tr>
<td><!-- Generator: GNU source-highlight 3.1.4 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite -->
<pre><tt>
    curl_setopt ($ch, CURLOPT_HTTPHEADER,
               array (
                     Accept: application/x-datacite+xml;q=0.9, application/citeproc+json;q=1.0
                          ));
</tt></pre>
</td>
</tr>
</table>
<p>Crossref now returns JSON (because it has the highest q value), while datacite returns XML because it comes first. The final, complete and functioning method now appears as follows:</p>
<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10">
<tr>
<td><!-- Generator: GNU source-highlight 3.1.4 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite -->
<pre><tt>      $url = "http://dx.doi.org/{$cite-&gt;identifier}";

      // get the metadata with negotiation
      $ch = curl_init();
      curl_setopt ($ch, CURLOPT_URL, $url );
      curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true );
      curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, true );

      // the order here is important, as both datacite and crossrefs content negotiation is broken.
      // crossref only return the highest match, but do check other content
      // types. So, should return json. Datacite is broken, so only return the first
      // content type, which should be XML.
      curl_setopt ($ch, CURLOPT_HTTPHEADER,
                   array (
                          "Accept: application/x-datacite+xml;q=0.9, application/citeproc+json;q=1.0"
                          ));

      // debug
      //$fh = fopen('/tmp/curl.log', 'w');
      //curl_setopt($ch, CURLOPT_STDERR, $fh );
      //curl_setopt($ch, CURLOPT_VERBOSE, true );

      $response = curl_exec ($ch);
      $status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
      $contenttype = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);

      // it's probably not a DOI at all. Need to check some more here.
      if( curl_errno( $ch ) == 404 ){
          curl_close($ch);
          return $cite;
      }

      curl_close ($ch);

      if( $contenttype == "application/citeproc+json" ){
           // crossref DOI
           // kcite application logic
      }

      if( $contenttype == "application/x-datacite+xml" ){
          //datacite DOI
          // kcite application logic
      }</tt></pre>
</td>
</tr>
</table>
<hr /> 
<h2><a name="_using_the_metadata"></a>Using the metadata</h2>
<p>Although we now have a single point of entry for accessing the metadata about a DOI, the metadata itself is still not standardised. Although CrossRef has returned metadata in (nearly!) the form that we are going to use, DataCite has returned XML conforming to their own schema. We still need to parse this XML. Fortunately, this is relatively easy in PHP, using the <tt>SimpleXMLElement</tt> class and xpath. The full code is available, so here I just show the sections involving xpath, for example, to retrieve the publisher and the title.</p>
<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10">
<tr>
<td><!-- Generator: GNU source-highlight 3.1.4 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite -->
<pre><tt>    $journalN = $article-&gt;xpath( "//publisher");
    $titleN = $article-&gt;xpath( "//title" );</tt></pre>
</td>
</tr>
</table>
<p>Initial testing suggested this works, sometimes. Unfortunately, I discovered that this failed for some DataCite DOIs. More solicitous debugging shows the problem; DataCite returns more than one form of XML. At first sight, the xpath should work, since the relevant elements are still in the same place. However, the default namespaces have changed&#8201;&#8212;&#8201;DataCite kernel 2.0 XML does not have a default namespace, while 2.1 and 2.2 do, which breaks the xpath. The situation is resolved by searching for namespaces, then parameterising the xpath queries.</p>
<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10">
<tr>
<td><!-- Generator: GNU source-highlight 3.1.4 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite -->
<pre><tt>       $namespaceN = $article-&gt;getNamespaces();
       $kn = "";
       if( $namespaceN[ "" ] == "http://datacite.org/schema/kernel-2.2" ){
           $kn = "kn:";
           $article-&gt;registerXpathNamespace( "kn", "http://datacite.org/schema/kernel-2.2" );
       }

       if( $namespaceN[ "" ] == null ){
           // kernel 2.0 -- no namespace
           // so do nothing.
       }

      $journalN = $article-&gt;xpath( "//${kn}publisher");
      $titleN = $article-&gt;xpath( "//${kn}title" );</tt></pre>
</td>
</tr>
</table>
<p>I now have a system capable of gathering bibliographic metadata from a DOI.</p>
<hr /> 
<h2><a name="_discussion"></a>Discussion</h2>
<p>In our original post <span class="kcite" kcite-id="ITEM-2006-2">(<a href="http://www.russet.org.uk/blog/2011/02/the-problem-with-dois/">http://www.russet.org.uk/blog/2011/02/the-problem-with-dois/</a>)</span>, we compared the situation with bioinformatics identifiers to DOIs. A Uniprot ID, for instance, such as <a href="http://www.uniprot.org/uniprot/P08100">http://www.uniprot.org/uniprot/P08100</a>, resolves to a protein record while <a href="http://www.uniprot.org/uniprot/P08100.fasta">http://www.uniprot.org/uniprot/P08100.fasta</a> returns the equivalent protein sequence. Content negotiation offers the possibility of achieving something similar with DOIs, at least with respect to the metadata if not the actual content.</p>
<p>My experience in practice shows that content negotiation does work and is useful, however, I am unconvinced that it is an ideal solution. From a theoretical stand point, the use of <tt>Accept</tt> headers seems nice. But in practice, it is painful because it is not commonly used. PHP does not support it, while even PHP with <tt>libcurl</tt> support requires me to set headers by hand, as there are no standard methods for doing so. Likewise, with <tt>curl</tt> on the command line, as shown in this example from CrossRef <span class="kcite" kcite-id="ITEM-2006-7">(<a href="http://www.crossref.org/CrossTech/2011/04/content_negotiation_for_crossr.html">http://www.crossref.org/CrossTech/2011/04/content_negotiation_for_crossr.html</a>)</span> which retrieves RDF metadata.</p>
<table border="0" bgcolor="#e8e8e8" width="100%" style="margin:0.2em 0;"> 
<tr>
<td style="padding:0.5em;">
<pre style="margin:0; padding:0;">curl -D - -L -H   "Accept: application/rdf+xml" "http://dx.doi.org/10.1126/science.1157784"</pre>
</td>
</tr>
</table>
<p>I would expect a similar experience within Perl, Python or Java; the tools of choice for a bioinformatician. I cannot email people a link to the metadata for a paper; I have no idea how you could access the RDF if you were using a desktop browser, or on a phone. From a personal perspective, I much prefer the approach offered by DataCite which uses URLs of the form <a href="http://data.datacite.org/application/x-datacite+xml/10.5524/100005">http://data.datacite.org/application/x-datacite+xml/10.5524/100005</a> which is genomic data about Emperor Penguins <span class="kcite" kcite-id="ITEM-2006-13">(<a href="http://dx.doi.org/10.5524/100005">http://dx.doi.org/10.5524/100005</a>)</span>. Content negotiation is hard work because although it is standard, being part of the HTTP specification, it is not common. The fact that neither DataCite nor CrossRef got their implementation right suggests to me that these are not my problems alone.</p>
<p>Of course, the DataCite approach is limited to DataCite DOIs, so <a href="http://data.datacite.org/application/x-datacite+xml/10.1371/journal.pone.0012258">http://data.datacite.org/application/x-datacite+xml/10.1371/journal.pone.0012258</a> returns a failure message. However, this mechanism implemented at <a href="http://dx.doi.org">http://dx.doi.org</a> would add a valuable and additional interface; it is actually very easy to implement, with a simple call to the content negotiated stack; a form of the PHP described in this post would perform the task well.</p>
<p>My original criticisms of DOIs included the enormous variety of entities that DOIs actually resolve to: the article in HTML or PDF, an abstract and a picture, author biographies, or an image of a print out of the front page <span class="kcite" kcite-id="ITEM-2006-2">(<a href="http://www.russet.org.uk/blog/2011/02/the-problem-with-dois/">http://www.russet.org.uk/blog/2011/02/the-problem-with-dois/</a>)</span>. Unfortunately, the experience is replicated at the metadata level. With two registration agencies, I have to deal with 4 different types of schema, although I am grateful to CrossRef to adding support for the one that I wanted. If I can managed to do an, admittedly, half-hearted job at integrating this data by blackbox resolution of a set of DOIs, it would be nice if the International DOI Foundation could do the job for me. Failing this, a single point of entry to the documentation for the different registration agencies would help.</p>
<p>Finally, the fact that DOIs provide a single, unified identifier at the metadata level turns out to be a disadvantage. There is, in reality, no such thing as a DOI; there are multiple different types of DOI. KCite supports two of them, that is CrossRef DOIs and DataCite DOIs. But there are 8 registration agencies <span class="kcite" kcite-id="ITEM-2006-14">(<a href="http://www.doi.org/registration_agencies.html">http://www.doi.org/registration_agencies.html</a>)</span>. It is, therefore, not possible to know what content types if any will be returned before hand.</p>
<p>The more general problem is for a given DOI, to my knowledge, there is no way of knowing which registration agency is responsible, at least not at the level of a <a href="http://dx.doi.org">http://dx.doi.org</a> URI (at the Handle level there must be, or the system would not work). For the average user, therefore, there is no way of knowing who is responsible for a given DOI. Strictly, this is true for a URL also. But if <a href="http://www.uniprot.org/uniprot/OPSD_HUMAN">http://www.uniprot.org/uniprot/OPSD_HUMAN</a> fails to resolve as I think it should do, there are a number of steps I can take. I can email <a href="mailto:webmaster@uniprot.org">webmaster@uniprot.org</a>. I can browse from <a href="http://www.uniprot.org">http://www.uniprot.org</a> looking for a contact. I can type <tt>whois uniprot.org</tt>. For a DOI, I have none of these tools (or rather everything points to the International DOI Foundation).</p>
<p>This problem was exemplified a few days after completing the work on KCite described here. I noticed that PDB has DOIs for its records, which should have worked with KCite. However, they were failing to resolve. Consider this (elided) output from <tt>curl</tt>.</p>
<table border="0" bgcolor="#e8e8e8" width="100%" style="margin:0.2em 0;"> 
<tr>
<td style="padding:0.5em;">
<pre style="margin:0; padding:0;">&gt; curl -D - "http://dx.doi.org/10.2210/pdb3cap/pdb"
HTTP/1.1 303 See Other
Server: Apache-Coyote/1.1
Location: ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/ca/pdb3cap.ent.gz

&gt; curl -D - -L -H  "Accept: application/citeproc+json" "http://dx.doi.org/10.2210/pdb3cap/pdb"
HTTP/1.1 303 See Other
Server: Apache-Coyote/1.1
Location: http://data.crossref.org/10.2210%2Fpdb3cap%2Fpdb

HTTP/1.1 404 Not Found
Date: Mon, 27 Feb 2012 13:58:39 GMT

Unknown DOI</pre>
</td>
</tr>
</table>
<p>The DOI resolves but the metadata does not. What was more confusing was this result which shows that some PDB DOIs <strong>did</strong> resolve.</p>
<table border="0" bgcolor="#e8e8e8" width="100%" style="margin:0.2em 0;"> 
<tr>
<td style="padding:0.5em;">
<pre style="margin:0; padding:0;">&gt; curl -D - -L -H   "Accept: application/rdf+xml" "http://dx.doi.org/10.2210/rcsb_pdb/mom_2012_2"
HTTP/1.1 303 See Other
Server: Apache-Coyote/1.1
Location: http://data.crossref.org/10.2210%2Frcsb_pdb%2Fmom_2012_2

HTTP/1.1 200 OK
Date: Mon, 27 Feb 2012 14:01:18 GMT
Content-Type: application/rdf+xml</pre>
</td>
</tr>
</table>
<p>In this case, it is possible to guess who the registration agency was (CrossRef) from the location of the RDF metadata, but this is undocumented and may not work for all registration agencies. Too much guesswork or specific knowledge of the DOI is involved. Thankfully, in this case, Karl Ward of CrossRef fixed the problem rapidly and now I can cite both the crystal structure of Opsin <span class="kcite" kcite-id="ITEM-2006-15">(<a href="http://dx.doi.org/10.2210/pdb3cap/pdb">http://dx.doi.org/10.2210/pdb3cap/pdb</a>)</span> and the Aminoglycoside Antibiotics <span class="kcite" kcite-id="ITEM-2006-16">(<a href="http://dx.doi.org/10.2210/rcsb_pdb/mom_2012_2">http://dx.doi.org/10.2210/rcsb_pdb/mom_2012_2</a>)</span>.</p>
<hr /> 
<h2><a name="_conclusions"></a>Conclusions</h2>
<p><a name="conclusions"></a> DOIs are and remain problematic. The addition of content negotiation at first sight appears to be a considerable improvement, but it usage is more complex than it should be. I offer here three suggestions based on my experience:</p>
<ul> 
<li> An alternative based on simple HTTP GET URIs should be provided </li>
<li> A standardised metadata schema for all DOIs, or at least a single point of entry to the documentation for all DOIs. </li>
<li> For a given DOI, there must be a standard mechanism to discover which registration agency is responsible. Without this, it is hard to discover which documentation and which schema applies. </li>
</ul>
<p>Despite this, KCite actively uses content negotiation; with it, I have dropped the number of HTTP requests I need to make to resolve the metadata for a DOI and this is a good thing. It is good to see the system getting more usable; I hope that this trend continues.</p>
<p><a name="end"></a></p>

<p>Bibliography</p>
<div class="kcite-bibliography"></div>
<script type="text/javascript">var citeproc_controls=false;
var blog_home_url="http://www.russet.org.uk/blog/";
</script>

</div> <!-- kcite-section 2006 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2012/03/dois-and-content-negotiation/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>KCite Spreads its Wings</title>
		<link>http://www.russet.org.uk/blog/2012/02/kcite-spreads-its-wings/</link>
		<comments>http://www.russet.org.uk/blog/2012/02/kcite-spreads-its-wings/#comments</comments>
		<pubDate>Fri, 17 Feb 2012 21:17:41 +0000</pubDate>
		<dc:creator>Phillip Lord</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1976</guid>
		<description><![CDATA[Today, I was pleased to release version 1.5 of kcite. Follwing in my tradition of being unable to get the WordPress plugin release process to work correctly, shortly after I released 1.5.1 which is the same thing, but with the correct metadata. I&#8217;m quite pleased with this release. There have been some underlying changes to [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1976">
<!-- coins metadata inserted by kblog-metadata -->
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=kblog-metadata.php&amp;rft.title=KCite+Spreads+its+Wings&amp;rft.source=An+Exercise+in+Irrelevance&amp;rft.date=2012-02-17&amp;rft.identifier=http%3A%2F%2Fwww.russet.org.uk%2Fblog%2F2012%2F02%2Fkcite-spreads-its-wings%2F&amp;rft.au=Phillip+Lord&amp;rft.format=text&amp;rft.language=English"></span><p><a name="preamble"></a> 
<p>Today, I was pleased to release version 1.5 of <a href="http://wordpress.org/extend/plugins/kcite/">kcite</a>. Follwing in my tradition of being unable to get the WordPress plugin release process to work correctly, shortly after I released 1.5.1 which is the same thing, but with the correct metadata.</p>
<p>I&#8217;m quite pleased with this release. There have been some underlying changes to the technology which I will describe in another post, but for now I want to focus on what I feel represents a substantial improvement on previous versions in terms of functionality. The previous <a href="http://www.russet.org.uk/blog/2011/12/kcite-the-next-generation/">release</a> added support for client-side rendering, which made things look nicer and will add more functionality into the future. However, from an authoring perspective this did not provide much advantage.</p>
<p>For the 1.5 release, I wanted to add new forms of identifier. Kcite started with the ability to cite digitial object identifiers to papers, as this reference to one of my papers shows <span class="kcite" kcite-id="ITEM-1976-0">(<a href="http://dx.doi.org/10.1371/journal.pone.0012258">http://dx.doi.org/10.1371/journal.pone.0012258</a>)</span>. As a bioinformatician, references to PubMed also seemed like a good idea <span class="kcite" kcite-id="ITEM-1976-1">(<a href="http://www.ncbi.nlm.nih.gov/pubmed/21414991">http://www.ncbi.nlm.nih.gov/pubmed/21414991</a>)</span>, even if, in most cases a doi could be used also.</p>
<p>However, for this release, I wanted to expand into two new areas. Kcite has come from the <a href="http://www.knowledgeblog.org">kblog</a> project where we have been trying to improve the formality of publication using a blog engine, so that it can become an recognised part of the scientific literature. Organisations like <a href="http://www.arxiv.org">arxiv</a> have done much the same thing (with far greater success!) with preprint. It makes sense then that kcite can now also link straight through to here also <span class="kcite" kcite-id="ITEM-1976-2">(<a href="http://arxiv.org/abs/1109.4518">http://arxiv.org/abs/1109.4518</a>)</span>. Likewise, we wanted to support the push to data citation, which we have achieved courtesy of <a href="http://www.datacite.org">datacite</a>. It is now possible to reference data sets also <span class="kcite" kcite-id="ITEM-1976-3">(<a href="http://dx.doi.org/10.5524/100005">http://dx.doi.org/10.5524/100005</a>)</span>.</p>
<p>And, finally, I have tidied up the presentation. Every item now has a visible URI in the reference list. I have made it visible because these identifiers should be public and present, not just providing an underlying link structure, although they do this as well now. In text citations link through to the bibliography, but also provide an outlink, direct to the resource also. Before processing, the underlying text citation displays the URI which should aid machine interpretability. This link will also be correct, since it is used to gather the metadata for the citation, rather expecting authors to tack a URI to a citation that already appears correct to them.</p>
<p>As always, new forms of publication raise questions and this release is no exception. We have, for instance, found that <a href="http://www.knowledgeblog.org">kblogs</a> are useful for publishing grey literature such as <a href="http://bio-ontologies.knowledgeblog.org/">bio-ontologies</a>. But at the moment, rather embarrasingly, there is no way to reference these articles (or indeed this article itself!) with KCite. And, second, there is an issue of provenance. For instance, the <a href="http://www.russet.org.uk/blog/2011/12/kcite-the-next-generation/">notice</a> describing the 1.4 release of kcite is now formatted with 1.5. It&#8217;s changed since it&#8217;s original publication because I have upgraded Kcite, as in turn will the presentation on this article change with the next release. I am hoping to address both of these issues in the near future.</p>

<p>Bibliography</p>
<div class="kcite-bibliography"></div>
<script type="text/javascript">var citeproc_controls=false;
var blog_home_url="http://www.russet.org.uk/blog/";
</script>

</div> <!-- kcite-section 1976 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2012/02/kcite-spreads-its-wings/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Restoring Kblog</title>
		<link>http://www.russet.org.uk/blog/2012/02/restoring-kblog/</link>
		<comments>http://www.russet.org.uk/blog/2012/02/restoring-kblog/#comments</comments>
		<pubDate>Mon, 13 Feb 2012 12:15:05 +0000</pubDate>
		<dc:creator>Phillip Lord</dc:creator>
				<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1971</guid>
		<description><![CDATA[We&#8217;ve been working for sometime now on our lightweight semantic publishing environment, kblog. Near the beginning of the last academic year, unfortunately, we were compromised through a zero-day vulnerability. With the press of academic life and teaching, it has taken me a long, long time to get the show back on the road. However, finally, [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1971">
<!-- coins metadata inserted by kblog-metadata -->
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=kblog-metadata.php&amp;rft.title=Restoring+Kblog&amp;rft.source=An+Exercise+in+Irrelevance&amp;rft.date=2012-02-13&amp;rft.identifier=http%3A%2F%2Fwww.russet.org.uk%2Fblog%2F2012%2F02%2Frestoring-kblog%2F&amp;rft.au=Phillip+Lord&amp;rft.format=text&amp;rft.language=English"></span><p><a name="preamble"></a> 
<p>We&#8217;ve been working for sometime now on our lightweight semantic <a href="http://www.russet.org.uk/blog/2011/06/ontogenesis-knowledgeblog-lightweight-semantic-publishing/">publishing</a> environment, <a href="http://knowledgeblog.org">kblog</a>. Near the beginning of the last academic year, unfortunately, we were <a href="http://www.russet.org.uk/blog/2011/09/kblog-has-been-compromised/">compromised</a> through a zero-day vulnerability. With the press of academic life and teaching, it has taken me a long, long time to get the show back on the road. However, finally, I think we have achieved this. The main cause for the delay was, simply, that I didn&#8217;t have time&#8201;&#8212;&#8201;at other stages of the year, it would not have been so much of a problem, but in October there is very little give in my working week. However, an additional problem has been that restoring kblog was a lot more effort than it should have been.</p>
<p>If you are only interested in kblog itself, well, it should be up now. The rest of this post is going to be a technical write-up.</p>
<p>First, we decided to take the opportunity to move kblog over to a larger machine. It was running on a small virtual machine previously. In general, this coped well with the traffic, but we wanted something with a bit more memory&#8201;&#8212;&#8201;it was a little stretched while we held workshops, and lots of people were authoring at once. We have also secured the machine up further than before. I wont go into too many details here, for obvious reasons, but I think it is less like to get hacked in future, although the risk is still there.</p>
<p>Kblog started out as an experiment: both a development machine for the knowledgeblog environment and a service for people to read. This is not really a tenable situation; in future, we&#8217;ll only upload new kblog plugins after a while longer testing. I&#8217;ve also removed some of the third party plugins&#8201;&#8212;&#8201;I want to move knowledgeblog to being run under wordpress DEBUG without warnings. For a while, this is like to result in a few missing pieces of functionality: gravatars and versioning being the most obvious examples. The server will not ONLY be for the web. Mailing lists and such like will move onto Google code; why host something we do not have to?</p>
<p>To restore cleanly, we have decided to use a from fresh install. No PHP from wordpress has been maintained from the old site; everything has been checked out anew. We have also been through the database, and as far as we can tell, there is no malicious content there, although we cannot discount the possibility.</p>
<p>The restoration itself was much harder than expected, for historical reasons. Kblog was originally a <a href="http://mu.wordpress.org/">WordPress MU</a> installation &#8212; the multisite version of wordpress. Since <a href="http://www.wordpress.org">WordPress 3.0</a> came out, however, this has not been an independent install; we ported over to a being a standard Network installed WordPress 3.0 a while back. However, this historical baggage was a substantial impediment to restoration. It turns out that the standard <a href="http://codex.wordpress.org/Installing_WordPress">install</a> and <a href="http://codex.wordpress.org/Restoring_Your_Database_From_Backup">restore</a> process doesn&#8217;t work under these circumstances. Essentially, we got the infamous &#8220;Error establishing a database connection&#8221;. With some help from the <a href="http://wordpress.org/support/topic/error-establishing-a-database-connection-338">forums</a>, I tried testing the MySQL connection (it worked), the privileges (which were correct), a clean installation in the same database (by swapping the table prefixes and installing anew). After a minor diversion with my rewrite rules, I could still find no problems or mistakes with my install.</p>
<p>At this point, I resorted to a full debug of the WordPress load. The error started in <tt>wp-settings.php</tt>, at the call to <tt>wp_not_installed()</tt> Chasing down a bit further, got me to <tt>is_blog_installed</tt>. This checks for the <tt>options</tt> table, which appeared to not exist, and then for other WP tables which DID exist, at which point it calls <tt>dead_db()</tt>. Despite having just connected to the database, WordPress prints the entirely unhelpful &#8220;Error establishing a database connection&#8221; message at this point.</p>
<p>After much poking around, I <a href="http://wordpress.org/support/topic/misleading-error-establishing-a-database-connection?replies=5">found</a> the cause. Rather than using a normalised schema in its database, wordpress use duplicate tables. So, rather than having <tt>WP_POSTS</tt>, with a <tt>blog id</tt> column, each blog has an independent set of tables, as well as a few which are shared between all blogs, a sort of &#8220;psuedoschema&#8221;. Between WPMU and WP3 this psuedoschema changed&#8201;&#8212;&#8201;the tables for the <strong>first</strong> blog were prefixed <tt>wp_1_</tt> in MU, but are now just <tt>wp_</tt>. Subsequent blogs are <tt>wp_2_</tt>, <tt>wp_3_</tt> and so on. Not ideal, but I guess is simplifies the SQL, as most of WordPress deals with only a single blog at a time.</p>
<p>Now, normally, wordpress copes with this. Something in the <tt>wp_config.php</tt> tells WordPress to look for <tt>wp_1_</tt> rather than <tt>wp_</tt>, when this <tt>wp_config.php</tt> has resulted from an upgraded WPMU installation. Unfortunately, I had done a fresh install. Now, I realised that this was a risk, but I decided that &#8220;fixing&#8221; the database was the best option. It already appears that having an ex-WPMU database had caused problems now, and into the future this is only going to get worse. So, I renamed the tables with a bit of SQL.</p>
<p>This worked, problem solved? Well, nearly. However, all the users from my first blog (served now from the <tt>wp_</tt> tables, previously from <tt>wp_1_</tt> tables) had disappeared, as had all my roles. Again, more forum searching found the <a href="http://wordpress.org/support/topic/upgrade-30-issues-w-tables-missing-wp_-vs-wp_1">problem</a>. Not only does WordPress use a psuedoschema, it stores table names in the database. So, <tt>wp_options</tt> had a line with a &#8220;option_name&#8221; called &#8220;wp_1_user_roles&#8221;. Again, more SQL.</p>
<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10">
<tr>
<td><!-- Generator: GNU source-highlight 3.1.4 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite -->
<pre><tt><b><font color="#0000FF">update</font></b> wp_options <b><font color="#0000FF">set</font></b> option_name <font color="#990000">=</font> <font color="#FF0000">"wp_user_roles"</font> <b><font color="#0000FF">where</font></b> option_name <font color="#990000">=</font> <font color="#FF0000">"wp_1_user_roles"</font><font color="#990000">;</font></tt></pre>
</td>
</tr>
</table>
<p>and back came my roles, but still no users. In the end, I greped through the database dump (I am sure that this is not how databases are supposed to work) for &#8220;wp_1&#8243;. The culprit was the <tt>wp_usermeta</tt> table. This SQL revealed the problem.</p>
<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10">
<tr>
<td><!-- Generator: GNU source-highlight 3.1.4 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite -->
<pre><tt><b><font color="#0000FF">select</font></b> <font color="#990000">*</font> <b><font color="#0000FF">from</font></b> wp_usermeta <b><font color="#0000FF">where</font></b> meta_key <b><font color="#0000FF">like</font></b> <font color="#FF0000">"wp_1</font><font color="#CC33CC">\_</font><font color="#FF0000">"</font><font color="#990000">;</font></tt></pre>
</td>
</tr>
</table>
<p>With a bit of Emacs hacking and Perl, I generated the 10 statements of this form&#8230;</p>
<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10">
<tr>
<td><!-- Generator: GNU source-highlight 3.1.4 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite -->
<pre><tt><b><font color="#0000FF">update</font></b> wp_usermeta <b><font color="#0000FF">set</font></b> meta_key <font color="#990000">=</font> <font color="#FF0000">"wp_capabilities"</font> <b><font color="#0000FF">where</font></b> meta_key <font color="#990000">=</font> <font color="#FF0000">"wp_1_capabilities"</font><font color="#990000">;</font></tt></pre>
</td>
</tr>
</table>
<p>which finally solved the issue, and voila, users are back.</p>
<p>So, all a bit messy really. In the end, I can see why WordPress decided to go for a psuedoschema approach, but then putting the database table names into two tables as well? To me, this goes to far, and is not a great design solution. Combined with the entirely unhelpful error message; well, it has cost me 3 or 4 days of hacking, time which could have been better spent. Combined with no straight-forward way of forcibly resetting the passwords for a network install (more Emacs, Perl and SQL!), I don&#8217;t think that wordpress has been overly helpful here.</p>
<p>Regardless, the situation now appears to have been resolved. While I may be slightly irritated with WordPress, it does show the strength of free software; as is often the case, the community were very helpful in providing me with the pointers I needed, and where they could not help, I did have the option of reading, altering and debugging the code. It may be the nuclear option, and one rather avoided, but it is there if you need it.</p>
<p>In the meantime, I have been playing around with <a href="http://www.russet.org.uk/blog/2011/12/kcite-the-next-generation/">kcite</a>. I was going to make it cleverer for the next release, but in the end, I have decided to focus on making it broader. I now have preliminary support for both <a href="http://arXiv.org">arXiv</a> and <a href="http://datacite.org">datacite</a> IDs and metadata. Neither was a great deal of effort, but are strategically important: data and preprints being considered to be first class citizens. Next up, support for refering to Kblog URLs directly.</p>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1971 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2012/02/restoring-kblog/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>More on Pici</title>
		<link>http://www.russet.org.uk/blog/2012/01/more-on-pici-2/</link>
		<comments>http://www.russet.org.uk/blog/2012/01/more-on-pici-2/#comments</comments>
		<pubDate>Fri, 20 Jan 2012 22:47:53 +0000</pubDate>
		<dc:creator>Phillip Lord</dc:creator>
				<category><![CDATA[Ontology]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1960</guid>
		<description><![CDATA[I started to write this post a long time ago in October; unfortunately before I finished I got hit with the start of teaching. I considered just ditching the post, as it is now so out-of-date and I am not usually a zombie poster. However, in this case, I shall post as a) it helps [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1960">
<!-- coins metadata inserted by kblog-metadata -->
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=kblog-metadata.php&amp;rft.title=More+on+Pici&amp;rft.source=An+Exercise+in+Irrelevance&amp;rft.date=2012-01-20&amp;rft.identifier=http%3A%2F%2Fwww.russet.org.uk%2Fblog%2F2012%2F01%2Fmore-on-pici-2%2F&amp;rft.au=Phillip+Lord&amp;rft.format=text&amp;rft.language=English"></span><p><a name="preamble"></a> 
<p>I started to write this post a long time ago in October; unfortunately before I finished I got hit with the start of teaching. I considered just ditching the post, as it is now so out-of-date and I am not usually a zombie poster. However, in this case, I shall post as a) it helps my mind to move back toward research after so long away and b) it will be my first of 2012, so I can check my makefiles work!</p>
<p>A couple of follow ups from my <a href="http://www.russet.org.uk/blog/2011/10/the-pici-principle-what-you-should-not-say/">previous post</a>.</p>
<p>Nicolas Le Novere commented via twitter on even the highest level assertion of that radioactivity is a dependent continuant.</p>
<blockquote><p>@phillord fluorescence and radioactivity are occurrent not continuant. Freeze time to check.</p>
<p>@phillord hence the unit of radioactivity: per second (Becquerel)</p>
<p align="right"> &#8212; Nicolas Le Novere </p>
</blockquote>
<p>In my original post, I suggested we needed <tt>Radiation</tt>, <tt>Radioactive</tt> or <tt>Radioactivity</tt>; in hind-sight, perhaps I should have used <tt>Radioactive</tt> rather than <tt>Radioactivity</tt>, which may have circumvented this issue. However, I think it is worth considering this a little further.</p>
<p>I would nearly agree with Nicolas that radioactivity is a process; actually, I would say that radioactive decay is a process, while radioactivity is a property of this process. However, in my last post, I was looking at a model which was &#8220;BFO-like&#8221; as OBI is based on BFO. For BFO, that radioactivity is a rate, is measured per second does not mean that it is an occurrent; any more than velocity which is also measure per second is an occurrent. Actually, in BFO land, <tt>radioactivity</tt> would be a quality of the atoms which are decaying and not a measurement of the process. This is because, as Pierre Grenon says, properties of processes <a href="http://groups.google.com/group/bfo-discuss/msg/a605f86a934b80da">do not exist</a>.</p>
<p>In fact, if we look more at this more closely still, BFO would also claim that radioactive decay is not, as it might appear, a <tt>Process</tt>, because processes are continuous. This is not true for radioactive decay, even for a bulk of radioactive material. An atom decays, then there is a pause, then another decays. This makes radioactive decay a <tt>processual entity</tt>, which can contain discontinuities.</p>
<p>I am not arguing that BFOs treatment of processes is correct&#8201;&#8212;&#8201;in fact, I think it is nonsensical. However, it is this line of arguing that I was using in my previous post.</p>
<p>David Sutherland rather takes me to task about whether realism does what I suggest.</p>
<blockquote><p>I agree completely, but what realist principle says you need to give something the most detailed classification you can come up with?</p>
<p align="right"> &#8212; David Sutherland </p>
</blockquote>
<p>It&#8217;s a good question, but I would turn it around. I don&#8217;t think that realism requires you do this, although this quote from Barry Smith does rather distinguish between simplifications (i.e. not the most detailed classification you can come up with) and reality.</p>
<blockquote><p>I am beginning to suspect that for you everything is a simplification (model) — for me, functions are part of reality; they are not simplifications; I am not interested in simplifications.</p>
<p align="right"> <em>http://groups.google.com/group/bfo-discuss/msg/865e601864fbc2dc</em><br /> &#8212; Barry Smith </p>
</blockquote>
<p>The problem, though, is that realism elevates &#8220;reality&#8221; above all else. I think that this is wrong. Of course, in any scientific discipline, we should by aiming to model the experimental data that we have. But this is not all we need to do. As any statistician will tell you, models are compromises. It is very easy to build a model that perfectly represents the data that you have; you just build a model with as many variables as data points. The model will fit perfectly to the data, but ultimately the model is useless, since it lacks explanatory power. We need use cases, we need simplifications and sometimes we will need multiple representations of the same thing; there are examples galore in my <a href="http://www.russet.org.uk/blog/2010/07/realism-and-science/">paper</a> <span class="kcite" kcite-id="ITEM-1960-0">(<a href="http://dx.doi.org/10.1371/journal.pone.0012258">http://dx.doi.org/10.1371/journal.pone.0012258</a>)</span>. In fact, Chris Mungall gives a good example when he talks about dispositions and their status as being real:</p>
<blockquote><p>In fact, I have a particular problem with dispositions being &#8220;real&#8221; &#8211; BFO asks me to believe there are an infinite number of real but unrealized and perhaps wildly improbable dispositions floating around me every second</p>
<p align="right"> &#8212; Chris Mungall </p>
</blockquote>
<p>And later he gives the solution.</p>
<blockquote><p>taking a hard-headed pragmatic approach &#8211; e.g. avoid weirdo classes that don&#8217;t correspond to a term a normal scientist would use; introduce distinctions that give you the desired results to queries and inferences)</p>
<p align="right"> &#8212; Chris Mungall </p>
</blockquote>
<p>In otherwords, reality is important. But we also need use cases, we need community norms, and we need applications. If ontologies do not fit with these, then can be as &#8220;real&#8221; as you like, but they are still wrong.</p>

<p>Bibliography</p>
<div class="kcite-bibliography"></div>
<script type="text/javascript">var citeproc_controls=false;
var blog_home_url="http://www.russet.org.uk/blog/";
</script>

</div> <!-- kcite-section 1960 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2012/01/more-on-pici-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>KCite &#8212; the next generation</title>
		<link>http://www.russet.org.uk/blog/2011/12/kcite-the-next-generation/</link>
		<comments>http://www.russet.org.uk/blog/2011/12/kcite-the-next-generation/#comments</comments>
		<pubDate>Tue, 13 Dec 2011 11:12:48 +0000</pubDate>
		<dc:creator>Phillip Lord</dc:creator>
				<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1954</guid>
		<description><![CDATA[Well, I am pleased to say that we have now released the new version of kcite. It&#8217;s been a while in coming&#8201;&#8212;&#8201;I had the difficult bit of the code working about 5 months ago, but then got caught up in teaching. Kcite is our bibliography manager which enables citations such as this one , using [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1954">
<!-- coins metadata inserted by kblog-metadata -->
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=kblog-metadata.php&amp;rft.title=KCite+%26%238212%3B+the+next+generation&amp;rft.source=An+Exercise+in+Irrelevance&amp;rft.date=2011-12-13&amp;rft.identifier=http%3A%2F%2Fwww.russet.org.uk%2Fblog%2F2011%2F12%2Fkcite-the-next-generation%2F&amp;rft.au=Phillip+Lord&amp;rft.format=text&amp;rft.language=English"></span><p><a name="preamble"></a> 
<p>Well, I am pleased to say that we have now released the new version of kcite. It&#8217;s been a while in coming&#8201;&#8212;&#8201;I had the difficult bit of the code working about 5 months ago, but then got caught up in teaching. Kcite is our bibliography manager which enables citations such as this one <span class="kcite" kcite-id="ITEM-1954-0">(<a href="http://dx.doi.org/10.1371/journal.pone.0012258">http://dx.doi.org/10.1371/journal.pone.0012258</a>)</span>, using DOI or PubMed IDs.</p>
<p>Kcite now uses the marvellous <a href="https://bitbucket.org/fbennett/citeproc-js/wiki/Home">citeproc.js</a> to render the bibliography on the client. The main advantage of this for this release is that the biblography formatting is slightly more regular than before. We&#8217;ve also switched to name-author style as the default. There is also a disadvantage which is that the browser has to do lots of Javascript execution client-side; I&#8217;ve made efforts to ensure that this is not too onerous; on my desktop, I have been rendering 200-300 item bibliographies, which is much more than most people will use in practice.</p>
<p>In future versions, however, I feel the use of citeproc-js will really come into it&#8217;s own. We should be able to enable the user to select their own citation style (currently this is the choice of the authors which makes little sense). We can also add any semantics to the HTML that we choose&#8201;&#8212;&#8201;CiTO will come properly, for instance. I can also clean up the &#8220;unresolved&#8221; and &#8220;timed out&#8221; references. However, first thing on the list is to make the call back for the bibliographic data asychronous. Client-side this <strong>should</strong> be easy, as we are already using jquery. Server-side requires rewrite rules which I haven&#8217;t done before, but I think should not be too hard.</p>
<p>On a separate track, now that I have kcite on what I think is a stable technological footing, I can start to extend in other ways, the most obvious being additional forms of identifiers, critically including WordPress posts with kcite enabled. I&#8217;m also pleased that Cross-Ref have recently added the ability to drag metadata in citeproc format (JSON), which means I can skip an integration step.</p>
<p>However, before all of that, we need to restore <a href="http://www.russet.org.uk/blog/2011/09/kblog-has-been-compromised/">kblog</a>. We&#8217;ve taken the opportunity to move it to a better technological footing, and have started to prepare the new machine that it will be hosted on. This has taken a long time, due to a busy start to the (academic) year. Hopefully, getting hacked is not something we will repeat soon.</p>
<p>The current release of kcite is 1.4.1. This fixes two bugs, one reported by Carl Boettinger (so that now the Javascript only loads when necessary) and another I found which writing this post which made editors appears as authors.</p>

<p>Bibliography</p>
<div class="kcite-bibliography"></div>
<script type="text/javascript">var citeproc_controls=false;
var blog_home_url="http://www.russet.org.uk/blog/";
</script>

</div> <!-- kcite-section 1954 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2011/12/kcite-the-next-generation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>June Tabor and the Oysterband</title>
		<link>http://www.russet.org.uk/blog/2011/11/june-tabor-and-the-oysterband/</link>
		<comments>http://www.russet.org.uk/blog/2011/11/june-tabor-and-the-oysterband/#comments</comments>
		<pubDate>Mon, 28 Nov 2011 17:33:41 +0000</pubDate>
		<dc:creator>Phillip Lord</dc:creator>
				<category><![CDATA[Art]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1952</guid>
		<description><![CDATA[It has been a long, long time since my last gig review. As this blog is mostly professional now, this is perhaps not such a bad thing. I did half write a review of Roy Harper and Joanna Newsome in Sept last year, but it never got posted. Don&#8217;t think I have been to gig [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1952">
<!-- coins metadata inserted by kblog-metadata -->
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=kblog-metadata.php&amp;rft.title=June+Tabor+and+the+Oysterband&amp;rft.source=An+Exercise+in+Irrelevance&amp;rft.date=2011-11-28&amp;rft.identifier=http%3A%2F%2Fwww.russet.org.uk%2Fblog%2F2011%2F11%2Fjune-tabor-and-the-oysterband%2F&amp;rft.au=Phillip+Lord&amp;rft.format=text&amp;rft.language=English"></span><p><a name="preamble"></a> 
<p>It has been a long, long time since my last gig review. As this blog is mostly professional now, this is perhaps not such a bad thing. I did half write a review of Roy Harper and Joanna Newsome in Sept last year, but it never got posted. Don&#8217;t think I have been to gig since then. Still onwards.</p>
<p>I&#8217;ve been a fan of June Tabor for a long time, particularly her album with Martin Simpson even if it does have terrible cover art. Despite this, and the fact that she lives pretty close to my home town, I&#8217;ve never seen her live. Her music is dark and eclectic, her voice rich. Combined with the Oysterband&#8217;s tendency to do strange adaptations folk-style it was destined to be an interesting gig. The music is something like gothic folk if that is not a contradiction in terms. While singing, June Tabor comes as a foreboding presence on stage. Between songs though, she&#8217;s entertaining, witty and light, which was a bit of a relief.</p>
<p>The gig was fantastic. Her voice is as excellent live as on record, with adding prescence. She adds to the music by, erm, explaining what it is all about (this can be something of a problem otherwise). The evening was well managed, moving from gentle and quieter music to end-of-evening barnstormers. It was good to be listening to live music again.</p>
<p>But one thing I didn&#8217;t understand. Why does Ray Cooper stand on a box while playing bass? He&#8217;s already the tallest.</p>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1952 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2011/11/june-tabor-and-the-oysterband/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The Pici Principle: What you should not say</title>
		<link>http://www.russet.org.uk/blog/2011/10/the-pici-principle-what-you-should-not-say/</link>
		<comments>http://www.russet.org.uk/blog/2011/10/the-pici-principle-what-you-should-not-say/#comments</comments>
		<pubDate>Tue, 18 Oct 2011 14:12:32 +0000</pubDate>
		<dc:creator>Phillip Lord</dc:creator>
				<category><![CDATA[Ontology]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1945</guid>
		<description><![CDATA[I once had cause to refer, somewhat mischievously, to &#8220;a kind of pasta from Tuscany, which is almost identical to spaghetti, but slightly different&#8221;; this was on a mailing list that was used by many Italians. It provoked the expected response; an offended Tuscan responded &#8220;I don&#8217;t know what you are talking about; but if [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1945">
<!-- coins metadata inserted by kblog-metadata -->
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=kblog-metadata.php&amp;rft.title=The+Pici+Principle%3A+What+you+should+not+say&amp;rft.source=An+Exercise+in+Irrelevance&amp;rft.date=2011-10-18&amp;rft.identifier=http%3A%2F%2Fwww.russet.org.uk%2Fblog%2F2011%2F10%2Fthe-pici-principle-what-you-should-not-say%2F&amp;rft.au=Phillip+Lord&amp;rft.format=text&amp;rft.language=English"></span><p><a name="preamble"></a> 
<p><a name="pici"></a></p>
<p>I once had cause to refer, somewhat mischievously, to &#8220;a kind of pasta from Tuscany, which is almost identical to spaghetti, but slightly different&#8221;; this was on a mailing list that was used by many Italians. It provoked the expected response; an offended Tuscan responded &#8220;I don&#8217;t know what you are talking about; but if you mean pici&#8221;, which I did, &#8220;it&#8217;s nothing like spaghetti&#8221;.</p>
<p>Recently, on the OBI mailing list, there has been much discussion about labels, markers or tracers. What ever you wish to call it, the basic idea is the same; a molecule which is easily detectable, is used to trace something else. This can involve adding a small amount of a radioactive isotope (P<sup>32</sup>). This makes it possible to follow the molecule (which is otherwise hard) by tracing the radiation (which is generally easy).</p>
<p>So, how do we model this? As with many parts of ontology building, it turns out to be not straight-forward; during this discussion, an <a href="http://sourceforge.net/mailarchive/message.php?msg_id=28115081">email</a> from <a href="http://www.oerc.ox.ac.uk/people/philippe-rocca-serra">Philipee Rocca-Serra</a> which left me asking the question, are we being too specific? I will work through an example to show what I mean. Feel free to skip to the <a href="#punchline">punchline</a> if you choose.</p>
<p>Consider, for example, the following models; these are not directly taken from OBI, as I want to reduce the complexity for this article; rather they are in the general spirit of the models which raised these questions.</p>
<p>A label, or something that has been labelled is clearly part of an experimental design. It is not intrinsic to this entity, rather it appears to be a role that the entity is playing in the experiment. So:</p>
<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10">
<tr>
<td><!-- Generator: GNU source-highlight 3.1.4 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite -->
<pre><tt><b><font color="#0000FF">Class</font></b>: Label
       <font color="#990000">SubClassOf:</font>
          Role</tt></pre>
</td>
</tr>
</table>
<p>There are, of course, labels of many sorts. The main types that I can think of are radioactive, fluorescent and what I call adherent. So, we might add the following, with a few subclasses of adherent as explanation.</p>
<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10">
<tr>
<td><!-- Generator: GNU source-highlight 3.1.4 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite -->
<pre><tt><b><font color="#0000FF">Class</font></b>: RadioactiveLabel
       <font color="#990000">SubClassOf:</font>
          Label

<b><font color="#0000FF">Class</font></b>: FluorescentLabel
       <font color="#990000">SubClassOf:</font>
          Label

<b><font color="#0000FF">Class</font></b>: AdherentLabel
       <font color="#990000">SubClassOf:</font>
          Label

<b><font color="#0000FF">Class</font></b>: BiotinilaytedLabel
       <font color="#990000">SubClassOf:</font>
           AdherentLabel

<b><font color="#0000FF">Class</font></b>: AntigenicLabel
       <font color="#990000">SubClassOf:</font>
           AdherentLabel</tt></pre>
</td>
</tr>
</table>
<p>So far so good. However, for a label to be useful, it needs to be manufactured (often in a bespoke fashion, depending on the experiment being performed) and it needs to be detectable. So, we might add classes like so:</p>
<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10">
<tr>
<td><!-- Generator: GNU source-highlight 3.1.4 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite -->
<pre><tt><b><font color="#0000FF">Class</font></b>: LabellingProcess
       <font color="#990000">SubClassOf:</font>
           Process
           has_output some Label

<b><font color="#0000FF">Class</font></b>: LabellingDetectionProcess
       <font color="#990000">SubClassOf:</font>
           Process
           has_input some
                  Sample contains some Label</tt></pre>
</td>
</tr>
</table>
<p>Now we have three classes for every label type. We can deal with this by generating a cross-product, either at development time, or at the time of use if we are using OWL. However, we need something to tie together these classes. We need a concept to know that we need a <tt>RadioLabellingProcess</tt> to produce a <tt>RadioLabel</tt> which we detect in a <tt>RadioLabellingDetectionProcess</tt>. In short, we need a concept of <tt>Radiation</tt>, <tt>Radioactive</tt> or <tt>Radioactivity</tt>.</p>
<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10">
<tr>
<td><!-- Generator: GNU source-highlight 3.1.4 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite -->
<pre><tt><b><font color="#0000FF">Class</font></b>: RadioactiveEntity
    <font color="#990000">SubClassOf:</font>
        IndependentContinuant,
        bears some Radioactivity

<b><font color="#0000FF">Class</font></b>: RadioactiveLabel
    <font color="#990000">SubClassOf:</font>
        Role,
        RadioactiveEntity

<b><font color="#0000FF">Class</font></b>: RadiationDetector
    <font color="#990000">SubClassOf:</font>
       detects some Radioactivity

<b><font color="#0000FF">Class</font></b>: RadioactiveLabelProductionProcess
    <font color="#990000">SubClassOf:</font>
       has_input some RadioactiveEntity</tt></pre>
</td>
</tr>
</table>
<p>This is where the situation gets difficult. What kind of thing is <tt>Radioactivity</tt>? Taking the realist approach, we need to consider this carefully, determining what this universal is. So, starting from the top, it is fairly obvious that we have a <tt>Continuant</tt>. Next question, do we have a <tt>Dependent</tt> or <tt>IndependentContinuant</tt>. Again, this is fairly clear: radioactivity cannot exist without something to be radioactive, hence <tt>Radioactivity</tt> is a <tt>DependentContinuant</tt>.</p>
<p>We have a set of <tt>DependentContinuant</tt>&#8216;s that <tt>Radioactivity</tt> could be. The concept <tt>Role</tt> does not fill well; this is usually ascribed by socially or, in this case, experimentally determined behaviour. Perhaps, <tt>Disposition</tt> would be better. However, this does not really fit either, as a <tt>Disposition</tt> is realised &#8220;under specific circumstances&#8221;. Now this is not true of radioactivity. Either something is radioactive or it is not, and if it is, then it is, to the best of our knowledge, radioactive under all circumstances. It appears, then, that <tt>Radioactivity</tt> is a <tt>Quality</tt>, because &#8220;it is exhibited if it inheres in an entity at all&#8221;.</p>
<p>If we follow the same logic with our other label types, initially, we come to the same conclusions. However, <tt>Fluorescence</tt> is not exhibited under all circumstances. It only happens when the label is illuminated with the right kind of light. So, <tt>Fluorescence</tt> appears to be a <tt>Disposition</tt>. Following a similar logic, this is also true of <tt>Adherent</tt>. So the best we can say about the property of the substance that makes it usable in labelling is that it is a <tt>RealizableEntity</tt>.</p>
<p>Having <tt>Radioactivity</tt> stand out in this way is a little unsatisfying. Let&#8217;s consider the logic again. One classic experimental form is the pulse-decay experiment. I can, for example, feed a rat with, say, radioactive phosphorus briefly. After this, you can trace the course of phosphorus. Now during the course of this experiment, the rat becomes radioactive and then ceases to be radioactive again. But, it is notably, the same rat. So, perhaps, the statement that things are either radioactive or not is wrong. Perhaps, it is not a <tt>Quality</tt> at all. The flaw in the logic is the assumption that because an atom is either radioactive or is not, therefore anything made up from atoms must be so. But an entity can have its atoms totally replaced and still be the same entity. In this case, what is true of a rat, is also true of its DNA. We can replace the atoms in a sample of DNA with other ones and still, have the same DNA. So, maybe, <tt>Radioactivity</tt> is a <tt>Quality</tt> at an atomic level of granularity, but is, after all, a <tt>Disposition</tt> at others.</p>
<p>Thinking further, however, maybe it is not a <tt>Quality</tt> at all. A mass of P<sup>32</sup> is always radioactive, but a single atom? Perhaps not, since it only displays this when it decays. So, perhaps, it is a <tt>Disposition</tt> after all. However, this makes no sense, because dispositions are displayed under &#8220;specific circumstances&#8221;. Now, to the best of our knowledge, radioactive decay is stochastic&#8201;&#8212;&#8201;it is so random, that radioactivity is often used to generate randomness. We cannot specify the circumstances under which it happens, it just does. More over, after it displays the radioactivity, what has happened to the atom? Using the same argument as before, we could say that, like the rat, the atom still exists, it&#8217;s just that (some of) the elementary particles that make it up have changed. But this way, surely, madness lies, as &#8220;being phosophorus&#8221; would become some sort of dependent continuant, which the atom displays during its decay, while it happens to have the right number of protons. So, probably it makes more sense to say that, the decay process represents the end of the existence of the phosophorus atom and the beginning of a new atom (and a radioactive particle). In which case, even our original decision that <tt>Radioactivity</tt> is <tt>DependentContinuant</tt> is wrong. It&#8217;s not a <tt>DependentContinuant</tt> at all, it&#8217;s only a process which over as soon as it begins.</p>
<p><a name="punchline"></a></p>
<p>So, what have we achieved? Well, I would argue, not a great deal, except for a lot of discussion. More over, we have ended discussing very detailed issues about the physical properties of matter, when we started discussing an ontology of biomedical investigations. This might be entertaining, or it might be very dull, depending on your point-of-view. But, what we have failed to produce is a specific conclusion.</p>
<p>The problem here is <strong>realism</strong>. A realist ontology represents portions of reality, that is classes of things that really have instances. We have to ask these questions to try and determine whether <tt>Radioactivity</tt> exists and what kind of thing that it is. We can set realism against <strong>pragmatism</strong>. Previously, Robert Stevens has described the problems that this causes by preventing the ontologist from modelling &#8220;<a href="http://robertdavidstevens.wordpress.com/2011/05/26/unicorns-in-my-ontology/">unicorns</a>&#8220;, such as Newtonian mechanics, or canonical anatomies. The unicorn principle says, if it is useful to model a concept in an ontology, then often we should. Here, I introduce what I call the &#8220;<a href="#pici">Pici</a> principle&#8221;&#8201;&#8212;&#8201;if it is not useful to model a concept then we should not. As a British native, pasta is pasta; it all tastes much the same to me. Generally, I do not need the ability to be able to distinguish pici and spaghetti, unless I want to provoke a response from an over-excitable Tuscan. The sensible course is not to get involved in the discussion in the first place.</p>
<p>The same applies in this instance. There is a clear use case for the concept of <tt>Radioactivity</tt>; without it, we cannot say that a radio-label is radioactive, or that a fluorescence detector is not going to work detecting it. But to achieve this use case, we do not need to understand very deeply what <tt>Radioactivity</tt> is. Describing it as a <tt>DependentContinuant</tt> is enough, and it will fulfil the use cases. It will not enable us to ask questions about which kind of labels detect qualities and which detect dispositions. But in the absence of a use case, this is not an issue.</p>
<p>A chemist may care, and may want to classify radioactivity further. This is fine; as with pasta, we can safely leave these issues to someone else, in the knowledge that they are probably better qualified to give an answer anyway. So long as they decide that <tt>Radioactivity</tt> is a <tt>DependentContinuant</tt>, it does not matter to us what kind of <tt>DependentContinuant</tt>; we have said nothing incorrect. So, our ontology will integrate with theirs, without change to either. By being as vague as our use cases allow us, we have actually increased the ability of our ontology to integrate with others.</p>
<p>In short, the pici principle encapsulates the idea that deciding what we <strong>should not</strong> model in an ontology is as important as what we <strong>should</strong> model. And this decision comes from use cases, not reality.</p>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1945 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2011/10/the-pici-principle-what-you-should-not-say/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Thoughts on a Chimney</title>
		<link>http://www.russet.org.uk/blog/2011/10/thoughts-on-a-chimney/</link>
		<comments>http://www.russet.org.uk/blog/2011/10/thoughts-on-a-chimney/#comments</comments>
		<pubDate>Tue, 18 Oct 2011 12:53:23 +0000</pubDate>
		<dc:creator>Phillip Lord</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1943</guid>
		<description><![CDATA[While I am currently spending a significant amount of my time promoting the idea that blog technology can be, and should be used for serious scientific material, I thought I would make a post of a different and perhaps more traditional vein: that is, a light-weight idea, with no serious research behind it, but Years [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1943">
<!-- coins metadata inserted by kblog-metadata -->
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=kblog-metadata.php&amp;rft.title=Thoughts+on+a+Chimney&amp;rft.source=An+Exercise+in+Irrelevance&amp;rft.date=2011-10-18&amp;rft.identifier=http%3A%2F%2Fwww.russet.org.uk%2Fblog%2F2011%2F10%2Fthoughts-on-a-chimney%2F&amp;rft.au=Phillip+Lord&amp;rft.format=text&amp;rft.language=English"></span><p><a name="preamble"></a> 
<p>While I am currently spending a significant amount of my time promoting the idea that blog technology can be, and should be used for serious scientific material, I thought I would make a post of a different and perhaps more traditional vein: that is, a light-weight idea, with no serious research behind it, but  Years ago now, I created an <a href="http://homepages.cs.ncl.ac.uk/phillip.lord/wiki/energy/index.html">Energy Wiki</a> full of daft ideas for making energy. I last revisted this in 2009, with an idea for <a href="http://www.russet.org.uk/blog/2009/05/the-sea-cylinder-storage-system/">storing energy at sea</a>. I&#8217;d actually forgotten that part of the reason for this was to try out Inkscape, which is part of the reason for this post. I wanted to try a bit of multi-media, that is, a blog post with an image in it. High tech.</p>
<p>So, the idea. One form of renewable is the <a href="http://en.wikipedia.org/wiki/Solar_updraft_tower">Solar Updraft Tower</a>, also known as a solar chimney. This works straightforwardly enough: you build a large greenhouse in a desert, with a very large chimney in the middle. The top of the chimney is in cold air, the bottom in hot, and an updraft results; stick a turbine in or at the base of the chimney, and you get energy out.</p>
<p>The problem is to work at all efficiently, you need a big temperature differential, so a tall chimney. This in turn means a wide chimney, both to support a substantial updraft, and for mechanical reasons. Tall means 500m or more. The bottom line of this is that a pretty significant capital expenditure is required, followed by a relatively long pay-back period, which in turn means that the biggest single expense of the project is likely to be interest charges, rather than anything else.</p>
<p>So, my idea, is to use an inflatable chimney instead. Initially, I thought about some kind of helium lifting scheme, but then I realised that this makes no sense; why not use hot air, which after all is what the whole system is designed to generate. Consider, for instance, the following organisation:</p>
<p><img src="http://www.russet.org.uk/blog/wp-content/uploads/2011/10/inflatable_solar_chimney.png" style="border-width: 0;" alt="Inflatable Chimney" height="500"></p>
<p>Essentially, it&#8217;s a traditional balloon with a hole in the middle. Obviously the whole system is stackable&#8201;&#8212;&#8201;a second balloon could be placed on top of the first and so on. The whole structure could be assembled or dissassembled as desired. Unfortunately, though this would probably take quite a bit of work.</p>
<p>My second thought came from the idea that, while most designs for solar chimneys have the chimney in the middle of the greenhouse, it doesn&#8217;t really need to be. A horizontal pipe to the middle would be enough. The chimney could be outside of the greenhouse. The advantage that this brings is that the tower could be raised or lowered in-situ, without the risk of it falling on, and damaging the greenhouse. So my second idea was to build the chimney as a two cylinders, with the gap between the serving as the inflatable, buoyant structure. By pleating the cylinders in opposite directions like so:</p>
<p><img src="http://www.russet.org.uk/blog/wp-content/uploads/2011/10/concertina_chimney.png" style="border-width: 0;" alt="Concertina Chimney" height="500"></p>
<p>the whole structure should concertina up and down. By inflating from the top and deflating from the bottom, it should be possible to raise or lower the entire system by opening and shutting vents at the bottom or top of each section to the inside of the chimney.</p>
<p>One advantage with this system, is that as the chimney gets higher, the temperature differential between the inside and the outside gets greater, which should mean that the taller the tower, the more bouyant the sections get; this should help to keep the entire thing as upright as possible, as will the air travelling through the middle, like some gigantic party blower.</p>
<p>Another addition that cames to mind would be to add inflatable half-toroids around the chimney at regular intervals. With a curve on the top, and a flat bottom-side, the entire thing should operate like an aerofoil, lifting the tower up; so, the windier it gets, the greater the lift, which is just what is needed to keep it as upright as possible. This should mean that the chimney can operate in relatively high wind levels.</p>
<p>This kind of system could even work in concert with a fixed chimney&#8201;&#8212;&#8201;extending the height by 500m say, and increasing it&#8217;s efficiency. It could also act as a supplement&#8201;&#8212;&#8201;operating only on very hot days when the greenhouse has excess capacity. Or, finally, it could operate while the main chimney was being built, meaning that a plant can start generating income earlier, which should reduce the cost of interest payments.</p>
<p>Of course, this all comes with drawbacks: the ongoing running costs are likely to be a significant; wind will remain a significant factor regardless; and, finally, inflating the tower will using hot air, which will reduce the efficiency of the whole system. Are these flaws significant? Well, as I said, this post is light-weight with no serious research behind it. I have no idea, nor any really clear idea about how to work out these costs. Answers on a postcard please.</p>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1943 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2011/10/thoughts-on-a-chimney/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Kblog has been compromised</title>
		<link>http://www.russet.org.uk/blog/2011/09/kblog-has-been-compromised/</link>
		<comments>http://www.russet.org.uk/blog/2011/09/kblog-has-been-compromised/#comments</comments>
		<pubDate>Tue, 20 Sep 2011 13:35:07 +0000</pubDate>
		<dc:creator>Phillip Lord</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1939</guid>
		<description><![CDATA[I have been pushing the idea of Kblogs&#8201;&#8212;&#8201;scientific publishing using commodity software&#8201;&#8212;&#8201;for a year or so know. Our main site, Knowledgeblog.org has got around 100 articles now, and has had about 50k page views (or about 4x the number of raw page hits) and has generated a certain presence on the internet. While this is [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1939">
<!-- coins metadata inserted by kblog-metadata -->
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=kblog-metadata.php&amp;rft.title=Kblog+has+been+compromised&amp;rft.source=An+Exercise+in+Irrelevance&amp;rft.date=2011-09-20&amp;rft.identifier=http%3A%2F%2Fwww.russet.org.uk%2Fblog%2F2011%2F09%2Fkblog-has-been-compromised%2F&amp;rft.au=Phillip+Lord&amp;rft.format=text&amp;rft.language=English"></span><p><a name="preamble"></a> 
<p>I have been pushing the idea of Kblogs&#8201;&#8212;&#8201;scientific publishing using commodity software&#8201;&#8212;&#8201;for a year or so know. Our main site, <a href="http://knowledgeblog.org">Knowledgeblog.org</a> has got around 100 articles now, and has had about 50k page views (or about 4x the number of raw page hits) and has generated a certain presence on the internet. While this is generally good, the price of fame is that we have moved somewhat up the list of potential hack targets. Unfortunately, this has resulted in two compromises on the machine; they were probably not disconnected, although we have no evidence to link the two at the moment.</p>
<p>The first was through the timthumb zero day vulnerability. It involved a code injection into a WordPress installation using a thumb nail generator with a dodgy bit of PhP in it. We cleaned the system up as well as we are able and went from there. Sadly, a couple of days ago, we had a second break in. This was a more serious and directed attack (the timthumb was scripted, and we were one of several thousands of sites to be hit). In this case, the machine has been root compromised, and the web server used to gather username/passwords in a phishing expedition. We do have backups and all of the content. There were a number of things that we could have done to secure the machine further, at least one of which may have prevented the hack, but there are only so many hours in the day.</p>
<p>So, where does this leave us? Is the whole idea of knowledgeblog broken? Personally, I do not think so. While I have been critical of the cost associated with academic publishing, I am aware that it cannot happen for free. Running and maintaining a web server takes money; it is something that we have been doing on a shoe-string for a while, especially since our JISC money ran out. In the couple of years that we have run knowledgeblog, I think that we have learned and shown a lot. As well as page views and content, we have shown that scientific publishing can be easy for the author; that we can generate attractive articles this way; that we can start to embed computational accessible knowledge into these articles. We have shown that we can do peer-review, if we need. We have shown we can <a href="http://wayback.archive.org/web/*/http://knowledgeblog.org">archive</a> and preserve for the future. We have shown that knowledgeblog is good for grey literature. We have added <a href="http://www.russet.org.uk/blog/2011/02/the-problem-with-dois/">DOIs</a>. Multiple authors. Good looking <a href="http://www.russet.org.uk/blog/2010/08/latex-to-wordpress/">maths</a>. We even have some preliminary stats on how much publication costs from Word doc to website.</p>
<p>At the moment, though, we do not have a business model. It is clear that if we are to move this forward, it needs to be run as a service, managed, and looked after, something which is neither my expertise or desire. The analogy that I have made earlier with <a href="http://www.russet.org.uk/blog/2011/06/the-naivete-of-scientists/">Wikipedia</a> is, I think, a good one; it would be good to move this into a foundation status.</p>
<p>The path from here to there is a long one, however. For the moment, we will restore knowledgeblog, and it will re-emerge, although at this time of year, it will take a while. But we look to the future as well.</p>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1939 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2011/09/kblog-has-been-compromised/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Oslo</title>
		<link>http://www.russet.org.uk/blog/2011/08/oslo/</link>
		<comments>http://www.russet.org.uk/blog/2011/08/oslo/#comments</comments>
		<pubDate>Sun, 28 Aug 2011 21:32:08 +0000</pubDate>
		<dc:creator>Phillip Lord</dc:creator>
				<category><![CDATA[Life]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1930</guid>
		<description><![CDATA[My first visit to Oslo was in 2006. That time, it was for work and we were some distance away from town. I remember the flight in gave a dramatic impression, and I remember sitting in the conference centre, looking over the hill side, breathing in the thick scent of pine watching the sun slowing [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1930">
<!-- coins metadata inserted by kblog-metadata -->
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=kblog-metadata.php&amp;rft.title=Oslo&amp;rft.source=An+Exercise+in+Irrelevance&amp;rft.date=2011-08-28&amp;rft.identifier=http%3A%2F%2Fwww.russet.org.uk%2Fblog%2F2011%2F08%2Foslo%2F&amp;rft.au=Phillip+Lord&amp;rft.format=text&amp;rft.language=English"></span><p><a name="preamble"></a> 
<p>My first visit to Oslo was in <a href="http://www.russet.org.uk/blog/2006/06/databasing-the-brain/">2006</a>. That time, it was for work and we were some distance away from town. I remember the flight in gave a dramatic impression, and I remember sitting in the conference centre, looking over the hill side, breathing in the thick scent of pine watching the sun slowing crawl toward the horizon at about 11pm. I only got into town the once, on the last night, and saw little of it which I was disappointed about. My second visit to Norway was to <a href="http://www.russet.org.uk/blog/2008/03/mermaids/">Trondheim</a> and I enjoyed that as well.</p>
<p>So I was looking forward to visiting Oslo again, for a few days, doing the tourist thing. But I am afraid that I have been disappointed again; this city has not really grabbed me. The architecture is impressive at points, but there is a random, thrown-together quality about the city overall; nothing to rival the magnificence that is <a href="http://en.wikipedia.org/wiki/Grainger_Town">Grainger Town</a> in Newcastle. And some of the signature buildings are, again, just okay; the <a href="http://en.wikipedia.org/wiki/Oslo_Opera_House">Opera House</a> has a roof you can walk up, but that seems to be it. The night time is subdued to say the least, and the food is okay at best. The only stand out feature seems to be an extra-ordinary number of sculptures&#8201;&#8212;&#8201;mostly bronzes, and often not famous people. Lots of nudes in heroic poses; the number involving seals is also distinctly above the average.</p>
<p>Of the two best things I have seen are, first the <a href="http://en.wikipedia.org/wiki/Frogner_Park">Sculpture Park</a>. Very classically laid out garden, but with some really very good sculpture, full of character and life. And seals. And second, the folk museum, which shows Norweigian life and buildings at different stages of history. I have to admit, though, that I was at a loss to see the difference, because over the last 4-500 years, this seems to basically have involved making robust, timber buildings on stilts. While the museum is good, I think, having less buildings, but better explained would improve it. When you get down to it, one wooden farmhouse looks very like another, especially when you can see it only from the outside.</p>
<p>Perhaps the biggest surprise though has been the accessibility for pushchairs. In Oslo, this is never an afterthought; they just have not thought about it at all. The tram doors slam on you if you take too long, which may happen if, say, you are struggling to get a heavy, unwieldy, pram-shaped object through a narrow door. My visit to the Opera House was limited to walking around the lobby, as walking up a sloping roof, with nothing but &#8220;slippery when wet&#8221; signs to break a clear run to the fjord is not my idea of fun. My visit to the National Gallery involved 20 steps to get in, to discover that the pushchairs are banned in the exhibition area; still, hey, you can visit the shop. Looking through the door of the National Museum (only 10 stairs up) and I could see a line of buggies next to the security guard. I didn&#8217;t even bother.</p>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1930 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2011/08/oslo/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Naivete of Scientists</title>
		<link>http://www.russet.org.uk/blog/2011/06/the-naivete-of-scientists/</link>
		<comments>http://www.russet.org.uk/blog/2011/06/the-naivete-of-scientists/#comments</comments>
		<pubDate>Thu, 30 Jun 2011 16:11:37 +0000</pubDate>
		<dc:creator>Phillip Lord</dc:creator>
				<category><![CDATA[Science]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1924</guid>
		<description><![CDATA[Although in some disciplines, it is relatively uncontentious, the rise of open access publishing has produced a lot of comment in others. In one of my two disciplines, computing science, this form of publication is still the minority, and still raises comment. For instance, Michel Beaudouin-Lafon has commented suggesting this scientists are highly naive about [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1924">
<!-- coins metadata inserted by kblog-metadata -->
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=kblog-metadata.php&amp;rft.title=The+Naivete+of+Scientists&amp;rft.source=An+Exercise+in+Irrelevance&amp;rft.date=2011-06-30&amp;rft.identifier=http%3A%2F%2Fwww.russet.org.uk%2Fblog%2F2011%2F06%2Fthe-naivete-of-scientists%2F&amp;rft.au=Phillip+Lord&amp;rft.format=text&amp;rft.language=English"></span><p><a name="preamble"></a> 
<p>Although in some disciplines, it is relatively uncontentious, the rise of open access publishing has produced a lot of comment in others. In one of my two disciplines, computing science, this form of publication is still the minority, and still raises comment. For instance, Michel Beaudouin-Lafon has <a href="http://delivery.acm.org/10.1145/1650000/1646367/p32-beaudouin-lafon.html">commented</a> suggesting this scientists are highly naive about the costs of publishing. He argues that scientific publishing is intrinsically expensive, and that open access will have negative implication for science as a whole.</p>
<blockquote><p>Over the years, commercial STM publishing has become a cutthroat business with cutthroat practices and we, the scientific and academic community, are the naive lambs, blinded by the ideals of science for the public good-or simply in need of more publications to advance our careers.</p>
<p align="right"> &#8212; Michel Beaudouin-Lafon </p>
</blockquote>
<p>Personally, I think that &#8220;naive&#8221; is the wrong word; scientists are often not good at operating in a co-ordinated way. Although, we work together in small groups, and sometimes in large groups, in general, we are still very much a cottage industry; at any one time the number of scientists working in a distinct discipline is not that large, even on a world-wide basis. Of course, this works pretty well for scientific advance; we are not a production industry, but researcher. No one knows the best way forward, and we need to experiment to find out. But it does mean that we often play second fiddle to those capable of more co-ordinated action; compare for example, scientists to the medical community with its tightly controlled professional bodies. Or, of course, the STM publishing industry, particularly as it has become focused in fewer and fewer competing publishers.</p>
<blockquote><p>For example, ACM spends several million dollars every year to support the reliable data center serving the Digital Library</p>
<p align="right"> &#8212; Michel Beaudouin-Lafon </p>
</blockquote>
<p>Clearly, it is true that the cost of data centres and storage are not trivial. But the cost of servicing data has plummeted over recent years. Scientific papers largely consist of storing words and figures; these do not take up much space. The laptop I am working on has a copy of my email directory; it&#8217;s not complete but it carries most of my <a href="http://www.russet.org.uk/blog/2007/07/preservation-for-the-future/">outgoing email</a> since 1994 and a lot of the incoming; this is a lot of words! But the total size is now less than 5G, which will fit on a 3 pound pen drive, or my phone. Now if ACM were storing research data, then it would be a totally different issue; the costs here are significant, problematic and rising. But they do not.</p>
<p>The ACM might spend several million dollars a year, but the bottom line here is that this does not account for the cost of publishing. The Wikimedia foundation which supports Wikipedia spends around 10 million dollars a year, in total, on one of the top ten websites in the World. This is about the daily cost of the whole scientific publishing industry.</p>
<blockquote><p>The quality of a journal is typically measured by its impact factor</p>
<p align="right"> &#8212; Michel Beaudouin-Lafon </p>
</blockquote>
<p>And a very bad measurement of journal quality it is too. As someone who works in two disciplines at once, I constantly get hit by this: my best computing publications have laughable impact factors when compared to my bio publications; when judged against computer scientists, however, my bio publications have such high impact factors, that they have to be ignored as outliers.</p>
<blockquote><p>At $5,000 per publication, my lab is broke.</p>
<p align="right"> &#8212; Michel Beaudouin-Lafon </p>
</blockquote>
<p>It is not clear where the $5,000 figure comes from, as most open access is less than this. But, anyway, this argument makes no sense. Our labs are already paying a vast amount of money for publications; usually this is squirrelled away in overheads, taken from our budgets before we see the money. And, although it doesn&#8217;t happen so much in computing, many journals levy significant page charges.</p>
<blockquote><p>They are the big pharmaceutical labs and the tech firms who publish very little but rely on the publication of scientific results for their businesses. With author-pay, research will pay so that industry can get their results for free. Is this moral?</p>
<p align="right"> &#8212; Michel Beaudouin-Lafon </p>
</blockquote>
<p>Open access on its own is not enough. we also need public disclosure about the process. Perhaps the examples of the pharmaceutical <a href="http://www.the-scientist.com/templates/trackable/display/blog.jsp?type=blog&amp;o_url=blog/display/55671&amp;id=55671">funding</a> journals directly are unusual. It is not so easy to tell at the moment. In this context, it could be argued that the last thing we need is the pharmaceutical industry paying for the results of science. Of course, conversely, the pharmaceutical industry could argue that they already do pay for the (publically funded) research by way of taxation.</p>
<p>While they are interesting, all of these arguments really miss the point: the pharmaceutical industry already get their results for free, as their subscription fees do NOT pay for the research just its publication. The publishing industry also get the results that they depend on for free or with page-charges by charging the authors. And for every paper that researchers publish for free, they pay more to read someone elses.</p>
<p>So, we are already in the situation that we are told is not moral.</p>
<blockquote><p>It is important to understand that the scientific community is largely at fault</p>
<p align="right"> &#8212; Michel Beaudouin-Lafon </p>
</blockquote>
<p>There is some truth in the idea that scientific community has let itself walk into the situation, but ultimately I feel, that this is like blaming the financial crises on those recieving subprime mortgages. It is true that it is scientists who submit their best work to expensive closed publishers; but, especially in early and mid &#8220;career&#8221;, we do this to safe-guard our futures.</p>
<blockquote><p>The problem with the subscription model is not the model but the fees.</p>
<p align="right"> &#8212; Michel Beaudouin-Lafon </p>
</blockquote>
<p>Quite the opposite. Ultimately, I don&#8217;t pay the fees, so how much do I really care? But the subscription model prevents re-purposing, it limits access, it prevents competition. I work at a university as a scientist because I value the ability to be able to swap and discuss my work. I want the general public to be able to access my research. Dissemination of knowledge should be part of my job; I think it is reasonable that I, or my employers, should pay for it.</p>
<p>Which is not to say that the level of fees are fine; they are not. They are far to expensive under any model.</p>
<blockquote><p>The added value provided by publishers is twofold: reputation (the value of the imprimatur), and archiving (the guarantee that the work will be available forever).</p>
<p align="right"> &#8212; Michel Beaudouin-Lafon </p>
</blockquote>
<p>And this is it? Is this all that we are getting, given the costs? Especially the the reputation comes from the work, not the journal, and the archiving should be a rapidly decreasing cost.</p>
<p>Actually, in practice, I think the current publishing industry brings more value; selection of reviewers, sometimes copy-editing and, critically, advertising of the content. But, again, times have changed, and publishing practice in these areas has not.</p>
<blockquote><p>The only other area in publishing where authors pay to get published is called the vanity press. Do we really want to enter that model?</p>
<p align="right"> &#8212; Michel Beaudouin-Lafon </p>
</blockquote>
<p>This is a low blow, nor is it true. Many people pay for their own publishing costs. The government pays to publish election results; health service pay to publish public health information; companies pay to publish product safety recalls. All circumstances where the value to the author of public awareness of their content far exceeds the income they would recieve from charging. And the biggest example of this is the advertising industry.</p>
<p>Nor is the implication that this will necessarily result in low quality true. Consider the blogosphere; of course, there is much junk, the standard of science journalism is very high; frankly, when ever respecting sources like the BBC start talking about <a href="http://news.bbc.co.uk/1/hi/health/7354458.stm">pixie dust</a>, it&#8217;s probably at least as high-standard the as mainstream media.</p>
<p>All this aside, what do I, as a scientist, actually care about? Some of these leap to mind:</p>
<ul> 
<li> Stable location and content. </li>
<li> archiving </li>
<li> peer review </li>
<li> discovery and selection </li>
</ul>
<p>Open access was built on the basis of replicating the existing publication. PLoS for example did this precisely so that it did not challenge both the business model and the publication procedure at the same time. How much of the costs stem from this? I think that we, as authors and readers, should know. How much of the millions the ACM spends on it&#8217;s data centre is involved in managing access controls, for example? How much on advertising? How much at booths at meetings?</p>
<p>Open access has opened the door, but now we need to challenge and change the process. Hosting data is not free nor is archiving. And, yet, I can find own my website from <a href="http://web.archive.org/web/20020203022740/http://www.russet.org.uk/">2002</a> and enjoy it&#8217;s gaudy colour scheme all again. If this blog post is so exciting to the world, that the load brings the server down, you will be able to read it on <a href="http://www.russet.org.uk.nyud.net/blog/">coral cache</a>. The peer review <strong>is</strong> expensive and time-consuming; I know because I&#8217;ve organised enough of it for <a href="http://www.bio-ontologies.org.uk">BioOntologies</a>. But then I did not get paid for this and how many of the real costs of peer-review do publishers bear? And discovery and selection? Well, we have google, and I follow my peers on twitter.</p>
<blockquote><p>Author fees are not a solution. [&#8230;] Finally, nonprofit publishers should take advantage of their unique position to experiment with sustainable evolutions of their publishing models.</p>
<p align="right"> &#8212; Michel Beaudouin-Lafon </p>
</blockquote>
<p>And on this, I could not agree more. Our experiment with <a href="http://www.knowledgeblog.org">Knowledgeblog</a> suggests that we can get 90% (or 80% or 70% depending on who you ask) with commodity software. It&#8217;s only a small start, but then I was on the mailing list that saw the first email about the creation of wikipedia, and that wasn&#8217;t long ago.</p>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1924 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2011/06/the-naivete-of-scientists/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Ontogenesis Knowledgeblog: Lightweight Semantic Publishing</title>
		<link>http://www.russet.org.uk/blog/2011/06/ontogenesis-knowledgeblog-lightweight-semantic-publishing/</link>
		<comments>http://www.russet.org.uk/blog/2011/06/ontogenesis-knowledgeblog-lightweight-semantic-publishing/#comments</comments>
		<pubDate>Tue, 07 Jun 2011 14:28:13 +0000</pubDate>
		<dc:creator>Phillip Lord</dc:creator>
				<category><![CDATA[Papers]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1920</guid>
		<description><![CDATA[This is a paper we wrote for STLR2011 also published directly on Knowledgeblog Abstract The web has moved from a minority interest tool to one of the most heavily used platforms for publication. Despite originally being designed by and for academics, it has left academic publishing largely untouched; most papers are available on-line, but in [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1920">
<!-- coins metadata inserted by kblog-metadata -->
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=kblog-metadata.php&amp;rft.title=Ontogenesis+Knowledgeblog%3A+Lightweight+Semantic+Publishing&amp;rft.source=1st+Workshop+on+Semantic+Web+Technologies+for+Libraries+and+Readers&amp;rft.date=2011-06-07&amp;rft.identifier=http%3A%2F%2Fwww.russet.org.uk%2Fblog%2F2011%2F06%2Fontogenesis-knowledgeblog-lightweight-semantic-publishing%2F&amp;rft.au=Phillip+Lord&amp;rft.au=Simon+Cockell&amp;rft.au=Daniel+C.+Swan&amp;rft.au=Robert+Stevens&amp;rft.format=text&amp;rft.language=English"></span><p>This is a paper we wrote for <a href="http://stlr2011.weebly.com/">STLR2011</a>
also published directly on <a href="http://knowledgeblog.org/128">Knowledgeblog</a></p>
<h1>Abstract</h1>
<div class="abstract"> The web has moved from a minority interest tool to one of the most heavily used platforms for publication. Despite originally being designed by and for academics, it has left academic publishing largely untouched; most papers are available on-line, but in PDF and are most easily read once printed. Here, we describe our experiments with using commodity web technology to replace the existing publishing process; the resource describing ontologies that we have developed with this platform; and, finally, the implications that this may have for publishing in a semantic web framework. </div>
<h1 id="a0000000002">Authors</h1>
<p> Phillip Lord Newcastle University Newcastle-upon-Tyne, UK </p>
<p>Simon Cockell Newcastle University Newcastle-upon-Tyne, UK </p>
<p>Daniel C. Swan Newcastle University Newcastle-upon-Tyne, UK </p>
<p>Robert Stevens University of Manchester Manchester, UK </p>
<h1 id="a0000000003">Introduction</h1>
<p>The Web was invented around 1990 as a light-weight mechanism for publication of documents, enabling scientists to share their knowledge, in the form of hypertext documents. Although scientists and later most academics, like the rest of society, have made heavy use of the web, it has not had a significant impact on the academic publication process. While most journals now have websites, the publication process is still based around paper documents or electronic representations of paper documents in the form of a PDF. Most conferences still handle submissions in the same way<a href="#a0000000004" class="footnote"><sup class="footnotemark">1</sup></a>. Books on the web, for example, are often limited to a table of contents. </p>
<p>For the authors (certainly from our personal experience), the process is dissatisfying; book writing is time-consuming, tiring and takes a number of years to come to fruition. If the book has one or a few authors, it tends to reflect only a narrow slice of opinion. Multi-author collected works tend to be even harder work for the editor than writing a book solo. Books do not change frequently; they are therefore out-of-date as soon as they are available. Authors feel a greater pressure for correctness, as they will have to live with the consequences of mistakes for the many years it takes to produce a second edition; most scientists welcome feedback, but being asked to justify something you wish you had not said becomes tiresome, especially if you are waiting to update it. </p>
<p>For the consumer of the material (either a human reader, or a computer), the experience is likewise limited. Books on paper are not searchable, not easy to carry around, are often not cheap to buy and more commonly very expensive to buy. For the computer, the material is hard to understand, or to parse. Even distinguishing basic structure (where do chapters start, who is the author, where is the legend for a given figure) is challenging. </p>
<p>All of this points to a need to exploit the Web for scientists to publish in a different way than simply replicating the old publishing process. Here, we describe our experiment with a new (to academia!) form of publishing: we have used widely-available and heavily used commodity software (WordPress <span class="cite">[<a href="#wordpress">7</a>]</span>), running on low-end hardware, to develop a multi-author resource describing the use of ontologies in the life sciences (our main field of expertise). From this experience, we have built on and enhanced the basic platform to improve the author experience of publishing in this manner. We are now extending the platform further to enable the addition of light-weight semantics by authors to their own papers, without requiring authors to directly use semantic web technologies, and within their own tool environment. In short, we believe that this platform provides a ‘cheap and cheerful’ framework for semantic publishing. </p>
<h1 id="a0000000005">The requirements</h1>
<p>The initial motivation for this work came from our experience within the bio-ontology community3. Biomedicine is one of the largest domains for use of ontology technology, producing large and complex ontologies such as the Gene Ontology <span class="cite">[<a href="#go2000">28</a>]</span> or SNOMED <span class="cite">[<a href="#snomed">27</a>]</span>. </p>
<p>As an ontologist, one of the most common questions that one has is: ‘where is there a book or a tutorial that I can read which describes how to build an ontology?’. Currently, there is some tutorial information on the web, there are some books; but there is not a clear answer to the question. Many of the books are collections of research-level papers, or are technologically biased. Currently many ontologists have learned their craft through years reading mailing lists, gathering information from the web and by word of mouth. We wished to develop a resource with short and succinct articles, published in a timely manner and freely available. </p>
<p>We wished, also, however to retain the core of academic publishing. This was for reasons both pragmatic, principled and political. Consider, for example, Wikipedia, that could otherwise serve as a model. Our own experience suggests that referencing Wikipedia can be dangerous: it can and does change over time meaning critical or supportive comments in other articles can be ‘orphaned’. Wikipedia maintains a ‘neutral point-of-view’ which, many are of the opinion, makes it less suitable for areas where knowledge is uncertain and disagreement frequent. Finally, Wikipedia is relatively anonymous in terms of authorship: whether this affects the quality of articles has been a topic of debate <span class="cite">[<a href="#wikipediaage">17</a>]</span>, but was not our primary concern; pragmatically, the promotion and career structure<a href="#a0000000006" class="footnote"><sup class="footnotemark">2</sup></a> for most academics requires a form of professional narcissism; they cannot afford to contribute to a resource for which they cannot claim credit. Of course, our experiences may not be reflective of the body academic overall; there has, for example, been substantial discussion of the issues of expertise on Wikipedia itself <span class="cite">[<a href="#wikipedia_expert">8</a>]</span>. Although the reasons may not be clear, it is clear that academics largely do not contribute to Wikipedia, and that Wikimedia sees this as an issue <span class="cite">[<a href="#Wikipedia_academics">16</a>]</span>. </p>
<p>We also had an explicit set of non-functional requirements. We needed the resource to be easy to administer and low-cost, as this mirrored our resource availability; authors should be offered an easy-to-use publishing environment with minimal ‘setup’ costs, or they would be unlikely to contribute; readers should see a simple, but reasonably attractive and navigable website, or they would be unlikely to read. </p>
<h1 id="a0000000007">The Ontogenesis experience</h1>
<p>Our previous experience with the use of blog software within academia was limited to ‘traditional’ blogging: short pieces about either: the process of science (reports about conferences, or papers for example); journalistic articles about other peoples research; or, personal blogging, that is articles by people who just happen to be academics. Although we wished to develop different, more formal content, this experience suggests that many academics find blogging software convenient, straight-forward enough and useful. </p>
<p>To test this, we decided to hold a small workshop of 17 domain experts over a two day period, and task them with generating content, conduct peer-review of this content and publish it as articles on a blog. </p>
<h2 id="a0000000008">Terminology and the Process</h2>
<p>Like many communities, the blogosphere has developed its own and sometimes confusing terminology. To describe the process we adopted we first describe some of this terminology. A <i class="itshape">blog</i> is a collection of web pages, usually with a common theme. These web pages can be divided into: <i class="itshape">posts</i> that are published (or <i class="itshape">posted</i>) on an explicit date and then unchanged; and <i class="itshape">pages</i> that are not dated and can change. Posts and pages have <i class="itshape">permalinks</i>: although they may be accessible via several URLs, they have one permalink that is stable and never changes. Posts and pages can be <i class="itshape">categorised</i> – grouped under a predefined hierarchy – or <i class="itshape">tagged</i> – grouped using <em>ad hoc</em> words or phrases defined at the point of use. A blog is usually hosted with a <i class="itshape">blog engine</i>, such as <i class="itshape">WordPress</i> that stores content in a database, combines it with style instructions in <i class="itshape">themes</i> to generate the pages and posts. Most blog engines support extensions to their core functionality with <i class="itshape">plugins</i>. Most blogs also support <i class="itshape">comments</i> or short pieces of content added to a post or page by people other than the original authors. Most blog engines also support <i class="itshape">trackbacks</i> which are bidirectional links: normally, a snippet from a linking post will appear as a comment in the linked to post. Trackbacks work both within a single blog and between different distributed blogs. Many blogs support <i class="itshape">remote posting</i>: as well as using a web form for adding new content, users can also post from third party applications, through a programmatic interface using a protocol such as XML-RPC or even by email. Posts and pages are ultimately written in headless HTML (that part of HTML which appears inside the <tt class="ttfamily">body</tt> element), although the different editing environments can hide this fact from the user. </p>
<p>Our initial process was designed to replicate the normal peer-review process, with a single adjustment, that peer-review was open and not blind: papers would be world-visible once submitted; the identities of reviewers would be known to authors; all reviews would be public. We adopted this approach for pragmatic reasons. WordPress has little support for authenticated viewing and none for anonymisation. The full process was as follows: </p>
<ul class="itemize">
<li>
<p>Authors write their content and publish using which ever tooling they find appropriate. </p>
</li>
<li>
<p>The author posts their content, categorising it as <i class="itshape">under review</i>. </p>
</li>
<li>
<p>An editor assigns two reviewers. </p>
</li>
<li>
<p>Reviewers publish reviews as posts or comments. Reviews link to articles, resulting in a trackback from article to review. </p>
</li>
<li>
<p>The author modifies the post to address reviews. </p>
</li>
<li>
<p>Once done to the editors satisfaction, the post is recategorised as <i class="itshape">reviewed</i>. </p>
</li>
</ul>
<p>Our expectation was that following this process, articles would not be changed or updated; this is in stark contrast to common usage for wiki-based websites. New articles could, however, be written updating, extending or refuting old ones. </p>
<h2 id="a0000000009">Reflections on the Ontogenesis K-Blog</h2>
<p>Our initial meeting functioned to ‘bootstrap’ the Ontogenesis K-Blog. This was useful to acquire a critical mass of content, but also, on this first outing, to explore the K-Blogprocess and technology. The setup for the day was the vannilla WordPressinstallation. The day started with a short presentation on the K-Blogmanifesto <span class="cite">[<a href="#onto-mani">22</a>]</span> and an overview of the process, including authoring and reviewing. The guidelines to authors were to write short articles on an ontology subject (a list of suggestions was offered and authors also made their own choices) and to produce the article in whatever manner they felt appropriate. There was a certain level of uncertainty among authors as to the K-Blogprocess (partly because one of the objectives of the meeting was to ‘force out’ the process) and this, naturally, pointed to the need to document the K-Blogprocess so that authors could have the typical ‘instructions to authors’. </p>
<p>This first meeting produced a set of 20 completed and partially completed articles. Some even had reviews. Even on the day itself there was some external interest seen from Twitter. The first external blog post (outside of those produced by attendees) happened during the meeting <span class="cite">[<a href="#first">19</a>]</span> with a second shortly after <span class="cite">[<a href="#second">18</a>]</span>. </p>
<p>We also held a second content provision meeting and together these generated a collection of articles that felt like an academic book in terms of content, but generated with considerably less effort. This experience was also sufficient to gather requirements on how to improve the K-Blogidea. A useful K-Blogon the K-Blogprocess itself was produced by Sean Bechhofer <span class="cite">[<a href="#arewethere">13</a>]</span>. There is also a K-Bloglooking back on the first year of the Ontogenesis K-Blog <span class="cite">[<a href="#firstyear">23</a>]</span>. </p>
<p>Several requirements emerged with respect to <b class="bfseries">authorship</b>. The principle of the short, more or less self-contained article was attractive (though the audience were somewhat self-selecting). Authoring directly in the editor provided by WordPress was felt to be poor by those that tried it. Authoring in a favourite editing tool and then publishing via WordPress worked reasonably well for most authors. There were, however, a variety of issues with the mechanism of this style of publishing; referring to articles that will be, but have not yet, been written. To some extent this was an artefact of the day (many articles being written simultaneously), but authors needed to refer to glossaries and articles in progress. </p>
<p>One stylistic issue was the habit of putting full affiliations at the top of an article. The ontogenesis theme presents the first few lines when displaying many articles, but in many cases this was simply showing the title and author affiliation; where it would be more useful to have the first sentence or so of the article itself. </p>
<p>For the whole K-Blog, a table of contents was felt to be important. This would give an overview of contents and a simple place for navigation about the K-Blog. This raised the issue of <b class="bfseries">attribution</b>; the table of contents needed to expose the authors, including multiple, ordered authors. This is not an unsurprising need, as the authors’ scientific reputation is involved. In this vein, making K-Blogarticles citable by issuing of Digital Object Identifiers (DOI) was requested. </p>
<p>For scientific credibility, the ability to handle <b class="bfseries">citations</b> easily was an obvious requirement. Natively, WordPresshas little or no support for styling citations and references. The ability to cite via DOI and, in this field, PubMed identifiers to automatically make links and produce a reference list was felt to be important. Also, having the Ontogenesis K-Blogarticles in PubMed would also be attractive to authors. </p>
<p>The last <b class="bfseries">authorship</b> issue was the <b class="bfseries">mutability</b> of articles. One aim of K-Blogis to enable articles to change in the light of experience and scientific development, as well as a procedural requirement for updates following review. There was felt to be a conflicting need for articles not to change, so that comments and links from other documents work in the longer term. </p>
<p>The last significant issue was the <b class="bfseries">reviewing</b> of articles. The aim was to have this managed by authors choosing reviewers (with editorial oversight). On the Ontogenesis K-Blogday this could work with authors calling across the room for a review. This is, however, not a sustainable approach. WordPress, however, lacks tracking facilities to manage the reviewing process, whether this is done by an author or an editor. The realisation that such management support is needed is not the greatest insight ever gained, but the requirement is there even in a light weight publishing mechanism. </p>
<h1 id="a0000000010">Improvements to the technology</h1>
<p>Our initial experiment with the ontogenesis K-Blogsuggested a significant number of issues with the use of WordPressfor scientific publication. In this section, we describe the extensions that we have made or used to the publication process, documentation or to WordPressitself. Following our initial experience with Ontogenesis, we have started to trial these improvements, including through another workshop which resulted in a new K-Blog <span class="cite">[<a href="#tavernakblog">12</a>]</span>, describing the scientific workflow engine Taverna <span class="cite">[<a href="#taverna">24</a>]</span>; work is also in progress on the use of a K-Blogfor bioinformatics <span class="cite">[<a href="#bioinf">1</a>]</span>, and another for public healthcare <span class="cite">[<a href="#health">3</a>]</span>. </p>
<p>Currently, we have 11 plugins extending the basic WordPressenvironment. For completeness, all of these are shown in Table <a href="#tab:plugins">1</a>. Our theme is also extended in some places to support the plugins. In general, the plugins are orthogonal and will work independently of each other. One advantage of using WordPressis that many of these plugins are freely available, written and maintained by other authors; while other academic publication environments, such as the Open Journal System <span class="cite">[<a href="#ojs">5</a>]</span> exist and are relatively widely-used, but WordPress is used to host perhaps 10% of the web, making the plugin ecosystem extremely fertile. </p>
<div id="tab:plugins" class="table">
<p><small class="small"><center>
<table cellspacing="0" class="tabular">
<tr>
<td style="border-top-style:solid; text-align:left; border-top-color:black; border-top-width:1px; border-right:1px solid black">
<p> Plugin </p>
</td>
<td style="border-top-style:solid; text-align:left; border-top-color:black; border-top-width:1px; border-right:1px solid black">
<p> Use </p>
</td>
<td style="border-top-style:solid; border-top-color:black; border-top-width:1px; text-align:left">
<p> URL</p>
</td>
</tr>
<tr>
<td style="border-top-style:solid; text-align:left; border-top-color:black; border-top-width:1px; border-right:1px solid black">
<p>Co-Authors Plus </p>
</td>
<td style="border-top-style:solid; text-align:left; border-top-color:black; border-top-width:1px; border-right:1px solid black">
<p> Allows K-Blog posts to have more than one author </p>
</td>
<td style="border-top-style:solid; border-top-color:black; border-top-width:1px; text-align:left">
<p> <a href="http://wordpress.org/extend/plugins/co-authors-plus/">http://wordpress.org/extend/plugins/co-authors-plus/</a></p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>COinS Metadata Exposer †</p>
</td>
<td style="text-align:left; border-right:1px solid black">
<p> Provides COinS metadata on K-Blog posts (used by Zotero, Mendeley etc) </p>
</td>
<td style="text-align:left">
<p> <a href="http://code.google.com/p/knowledgeblog/">http://code.google.com/p/knowledgeblog/</a></p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>Edit Flow </p>
</td>
<td style="text-align:left; border-right:1px solid black">
<p> Gives editorial process management infrastructure </p>
</td>
<td style="text-align:left">
<p> <a href="http://editflow.org/">http://editflow.org/</a></p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>ePub Export </p>
</td>
<td style="text-align:left; border-right:1px solid black">
<p> Exports K-Blog posts as ePub documents </p>
</td>
<td style="text-align:left">
<p> <a href="http://wordpress.org/extend/plugins/epub-export/">http://wordpress.org/extend/plugins/epub-export/</a></p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>KCite \(\ast \) </p>
</td>
<td style="text-align:left; border-right:1px solid black">
<p> Automatic processing of DOIs and PMIDs into in-text citations and bibliographies </p>
</td>
<td style="text-align:left">
<p> <a href="http://knowledgeblog.org/kcite-plugin">http://knowledgeblog.org/kcite-plugin</a></p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>Knowledgeblog Post Metadata Plugin \(\ast \) </p>
</td>
<td style="text-align:left; border-right:1px solid black">
<p> Exposes generic metadata in post headers </p>
</td>
<td style="text-align:left">
<p> <a href="http://code.google.com/p/knowledgeblog/">http://code.google.com/p/knowledgeblog/</a></p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>Knowledgeblog Table of Contents \(\ast \) </p>
</td>
<td style="text-align:left; border-right:1px solid black">
<p> Produces a table of contents based on a category of articles. Posts are listed with all authors </p>
</td>
<td style="text-align:left">
<p> <a href="http://knowledgeblog.org/knowledgeblog-table-of-contents-plugin">http://knowledgeblog.org/knowledgeblog-table-of-contents-plugin</a></p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>Mathjax L<sup style="font-variant:small-caps; margin-left:-0.3em">a</sup>T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X\(\ast \) </p>
</td>
<td style="text-align:left; border-right:1px solid black">
<p> Enables use of T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>Xor MathML in posts, rendered in scalable web fonts </p>
</td>
<td style="text-align:left">
<p> <a href="http://knowledgeblog.org/mathjax-latex-wordpress-plugin">http://knowledgeblog.org/mathjax-latex-wordpress-plugin</a></p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>Post Revision Display </p>
</td>
<td style="text-align:left; border-right:1px solid black">
<p> Publicly exposes all revisions of an article after publication </p>
</td>
<td style="text-align:left">
<p> <a href="http://wordpress.org/extend/plugins/post-revision-display/">http://wordpress.org/extend/plugins/post-revision-display/</a></p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>SyntaxHighlighter Evolved </p>
</td>
<td style="text-align:left; border-right:1px solid black">
<p> Syntax Highlights source code embedded in posts </p>
</td>
<td style="text-align:left">
<p> <a href="http://wordpress.org/extend/plugins/syntaxhighlighter/">http://wordpress.org/extend/plugins/syntaxhighlighter/</a></p>
</td>
</tr>
<tr>
<td style="border-bottom-color:black; border-bottom-width:1px; text-align:left; border-bottom-style:solid; border-right:1px solid black">
<p>WP Post to PDF </p>
</td>
<td style="border-bottom-color:black; border-bottom-width:1px; text-align:left; border-bottom-style:solid; border-right:1px solid black">
<p> Allows visitors to download posts in PDF format </p>
</td>
<td style="border-bottom-width:1px; border-bottom-color:black; border-bottom-style:solid; text-align:left">
<p> <a href="http://wordpress.org/extend/plugins/wp-post-to-pdf/">http://wordpress.org/extend/plugins/wp-post-to-pdf/</a></p>
</td>
</tr>
</table>
<div class="caption"><b>Table 1</b>: <span>WordPress plugins employed by K-Blog. Plugins marked with \(\ast \) are written by the authors. Plugins marked with \(\dag \) are modified by the authors. </span></div>
<p>  </center></small></p>
</div>
<p><b class="bfseries">Reviewing:</b> The initial process was self-managed and required two reviews per article; this was found to be cumbersome. We have addressed this in two ways; first, we have defined a number of different peer-review levels (public review, author review, editorial review <span class="cite">[<a href="#levels">15</a>]</span>), including a light-weight process now being used for Ontogenesis; authors now select their own reviewers, and decide for themselves when articles are complete. Second, we have added software support. Initially, we attempted to use RequestTracker – an open source ticket system, but found the user interface too complex for this purpose. We are now using the EditFlow plugin to WordPress that was designed for managing a review process—albeit a hierarchical rather than peer-review process. </p>
<p><b class="bfseries">Authoring Environment:</b> The standard WordPresseditor was found impractical by most authors, even for short articles. WordPressdoes provide ‘paste from word’ functionality, but this removes all formatting which defeats the point. While the lack of a good editing environment could have been a significant problem, our subsequent experimentation has shown that it is possible to post directly from a wide variety of tools, including ‘office’ tools such as Word, Google Docs, LiveWriter and OpenOffice. This is in addition to a variety of blog-specific tools and text formats (such as asciidoc), which are suitable for some users. We have added documentation to a kblog (<a href="http://process.knowledgeblog.org">http://process.knowledgeblog.org</a>) to address these. In practice, only L<sup style="font-variant:small-caps; margin-left:-0.3em">a</sup>T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X proved problematic having no specific support. To address this, we have produced a tool called <b class="bfseries">latextowordpress</b>; this is an adaptation of the plasT<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X tool, a python based T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X processor, to produce simplified HTML appropriate for WordPresspublishing. Our experience with using the tools is that while none are perfect, sometimes requiring ‘tweaking’ of HTML in WordPress, most reduce publishing time to seconds. </p>
<p><b class="bfseries">Citations:</b> We have addressed the lack of support for citations within WordPresswith a plugin called <b class="bfseries">kcite</b>. This allows authors to add citations into documents as <tt class="ttfamily">shortcodes</tt> with either a DOI or Pubmed ID (other identifiers can and are being added to kcite). Shortcodes are a commonly used form of markup of the form: &#91;tag att=&#8221;att&#8221;]text[/tag]; they are often found where a simplified HTML-like markup is desired. A bibliography is then generated automatically on the web server. Requiring authors to add markup to otherwise WYSIWYG tools is damaging to the user experience. We believe that this is soluable, however, by extending bibliographic tools, by developing a ‘kcite’ style-file or template; we have a prototype of this (using CSL <span class="cite">[<a href="#csl">10</a>]</span>) for Zotero and Mendeley, and another for asciidoc with bibtex. It is also possible to just use native tool support in Word or L<sup style="font-variant:small-caps; margin-left:-0.3em">a</sup>T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X, and convert bibliographies to HTML; the disadvantage with this approach is discussed later. </p>
<p><b class="bfseries">Archiving and Searching:</b> Archiving is primarly a social, rather than technological, problem. A blog engine is fully capable of storing content in the long-term, but authors and readers have to believe that it will do so. As a novel form of academic publishing, K-Blogis not automatically archived by as a scientific journal. However, we have taken advantage of its web publication; the main K-Blogsite is now explicitly archived by the UK Web Archive, as well as implicitly by other web archives. We have enhanced the website with an ‘easy crawl’ plugin–that is a single web page pointing to add articles classified as reviewed. We now support the (technical) requirements for LOCKSS and Pubmed. Simultaneously, this also enhances the searchability of K-Blog, fulfilling the requirements for Google scholar. </p>
<p><b class="bfseries">Non-repudiability:</b> The K-Blogprocess does not allow authors to make semantically meaningful changes after an article has been reviewed. Unfortunately, it is hard to define ‘semantically meaningful’ computationally, so we have made no attempt to address this by locking articles; rather, all versions of articles are now accessible to the reader (WordPressprovides this facility to the authors by default). This enables community enforcement of a no-change policy. </p>
<p><b class="bfseries">Multiple Authors:</b> We believe that authoring is best done outside WordPress. This also means that we do not support multiple-authorship; we have made no attempt to add collaborative features to WordPress. However, we did need articles to carry a byline attributing the articles to multiple authors; although not critical to the functioning of a K-Blog, it is socially critical to appease the professional narcissism (see Section <a></a>) of scientists. Fortunately, this is a common requirement, and a suitable WordPressplugin existed. </p>
<p><b class="bfseries">Identifiers:</b> WordPress already supports permalinks; although we believe that URLs are entirely fit for purpose technologically while DOIs do little other than introduce complexity <span class="cite">[<a href="#problemdois">11</a>]</span>, K-Blogrequired DOIs for professional narcissism. We considered becoming an DOI authority, but this proved impractical. Instead, we have used DataCite <span class="cite">[<a href="#datacite">2</a>]</span>. This has required a small extension to WordPress to extract appropriate metadata and to store the DOIs once minted. </p>
<p><b class="bfseries">Metadata:</b> K-Blognow uncovers various parts of its metadata in a number of ways; unfortunately, there appear to be a large number of (non-)standards in use, each with its own application. K-Blogcurrently provides: COiNS, enabling integration with Zotero and Mendeley; meta tags for Google Scholar; and Dublin Core tags for no specific reason than completeness. We are in the process of providing bibtex export (for bibtex!), and a JSON representation to support citeproc-js <span class="cite">[<a href="#citeproc-js">14</a>]</span> in the second generation of kcite. </p>
<p><b class="bfseries">Mathematics and Presentation:</b> We have also provided several pieces of technology that did not stem from concrete requirements arising from the initial Ontogenesis meeting. We have improved parts of the presentation system by adding, for example, syntax highlighting to code blocks. Additionally, we have created the <b class="bfseries">mathjax-latex</b> plugin enabling the use of T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X(or MathML) markup in posts that are then rendered in the browser using scalable fonts. WordPresshas native math-mode T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X support, but using image fonts which do not scale and have an ugly pixelated display. </p>
<h1 id="a0000000011">Discussion</h1>
<p>We have been motivated by a lack of enthusiasm for traditional book publishing to devise another mechanism by which we can achieve the same ends. We wished to avoid the downsides of an ‘all or nothing’ approach to creating a ‘static’ paper document that is read by relatively few people due to price. The K-Blogapproach allows authors to publish in a piecemeal fashion; writing only that which they are motivated to write using a mechanism that avoids a third party making arbitrary decisions on formatting with peculiar time-scales. </p>
<p>To avoid all this, the K-Blogis a light-weight publishing process based on commodity blogging software. We have taken an approach of writing short articles around a theme of ‘ontology in biology’; the Ontogenesis K-Blog. At the time of writing we have 26 articles and page viewing numbers that are pleasing (see Figure <a href="#fig:views">1</a>). These statistics are generated by WordPressdirectly, and represent (an approximation of) ‘real’ page reads, with robot and self-viewing removed. This is confirmed by the ten most read articles (Table <a href="#sec:acknowledgements">2</a>) that reflect our expectations – ‘What is an ontology’ being first. In this sense, we consider the K-Blogprocess to be a success, especially when considered against the circulation of an equivalent book. </p>
<div id="fig:views" class="figure"><center><img src="http://knowledgeblog.org/files/2011/06/stats-line.png" /> 
<div class="caption"><b>Figure 1</b>: <span>Month page view statistics for the Ontogenesis K-Blog.</span></div>
<p>  </center></div>
<div id="sec:acknowledgements" class="table"><center>
<table cellspacing="0" class="tabular">
<tr>
<td style="border-top-style:solid; text-align:left; border-top-color:black; border-top-width:1px; border-right:1px solid black">
<p> What is an ontology? </p>
</td>
<td style="border-top-style:solid; border-top-color:black; border-top-width:1px; text-align:left">
<p> 1,737</p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>OWL Syntaxes </p>
</td>
<td style="text-align:left">
<p> 1,246</p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>Ontology Learning </p>
</td>
<td style="text-align:left">
<p> 882</p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>Table of Contents </p>
</td>
<td style="text-align:left">
<p> 740</p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>What is an upper level ontology? </p>
</td>
<td style="text-align:left">
<p> 684</p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>Reference and Application Ontologies </p>
</td>
<td style="text-align:left">
<p> 630</p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>Protege &amp; Protege-OWL </p>
</td>
<td style="text-align:left">
<p> 522</p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>Semantic Integration in the Life Sciences </p>
</td>
<td style="text-align:left">
<p> 517</p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>Automatic maintenance of multiple inheritance ontologies </p>
</td>
<td style="text-align:left">
<p> 469</p>
</td>
</tr>
<tr>
<td style="border-bottom-color:black; border-bottom-width:1px; text-align:left; border-bottom-style:solid; border-right:1px solid black">
<p>Ontologies for Sharing, Ontologies for Use </p>
</td>
<td style="border-bottom-width:1px; border-bottom-color:black; border-bottom-style:solid; text-align:left">
<p> 330</p>
</td>
</tr>
</table>
<div class="caption"><b>Table 2</b>: <span>Most Viewed articles for the Ontogenesis K-Blog(Totals).</span></div>
<p>  </center></div>
<p>The social processes with K-Blogare largely similar to traditional publishing, with one exception – reviewing is public. While we may have been interested in experimenting with this for principled reasons, in practice we adopted it because we did not know how to support blind anonymous review with WordPress. Open review is not a new idea: Request For Comments are common in standards processes; both Nupedia <span class="cite">[<a href="#nupedia">4</a>]</span> (the fore-runner of Wikipedia) and H2G2 <span class="cite">[<a href="#h2g2">6</a>]</span> (which predates Nupedia) use public peer-review. It is still, however, unusual in academia. In our experience from Ontogenesis, it raised no worries from among our contributors, except that reviewers often wanted to be more involved in the proofing, a role normally played by authors low down the author list; open review processes blurs these lines somewhat. </p>
<p>One open area for the discussion is the extent to which authors can, should be and wish to change articles after publication. While the ability to update is inherent in the web, the desire for non-repudiability was considered to be important; the contradiction here appears fundamental, and we do not feel we have reached a good compromise yet. In one sense, our use of the post-revision display plugin solves this problem; even if the article changes, it is still possible to refer to a specific version. However, like all automated versioning tools, many versions get recorded often with very fine-grained changes, which makes selection of the ‘right’ version hard to impossible. We could replace this with an explicit versioning tool, similar to a source code versioning system; but these systems are hard-to-use for those unused to them, as well as being difficult to implement well. An environment like K-Blog, however, does allow rapid publication of and bi-directional linking with articles; combined with typed linking with CiTO, the ability to publish erratum, addendum and second editions may be a better solution. </p>
<p>Our experiences with K-Blog, we think, are useful in understanding how semantic web technology can and will impact on the publication and library process. Both from our initial work with Ontogenesis, and subsequent work with <a href="http://taverna.knowledgeblog.org">http://taverna.knowledgeblog.org</a>, it has become obvious that good tool support is critical. ‘Good’ in this sense can be straight-forwardly interpreted as ‘familiar’ that in general can be interpreted as MS Word. Our choice of a blogging engine here was (unexpectedly) well-advised, as this form of publication is already supported by many tools. It is also clear that there are many other tools that could be added; while Ontogenesis has the content, for example, that might be found in an academic book, it does not currently have the presentation of the book. Articles are already available as ePUB, and more recent work has used our Table of Contents plugin to provide a single site-wide ePUB of all articles <span class="cite">[<a href="#epub_from_wordpress">25</a>]</span>. Pre-existing tools such as Anthologize <span class="cite">[<a href="#anthologize">9</a>]</span> may also be useful for adding organised collections of articles gathered from the whole. </p>
<p>This has a direct implication on the addition of further semantics to content. On the positive side, the use of WordPress makes semantic additions plausible in a way that many conventional publishing processes do not. For example, the publication of our (PWL, RS) recent paper <span class="cite">[<a href="#reality_in_biology_2010">20</a>]</span> required conversion from the LaTeX source to PDF (by latex), to another PDF, to a MS Word file (by hand), to XML before arriving at the final HTML form. This process took many weeks, required multiple interactions between the authors and publisher. It still failed to preserve the semantic use (to humans) of Courier font highlighting in-text ontology terms and requiring post-publication correction. The equivalent blog post <span class="cite">[<a href="#reality_in_biology_2010_blog">21</a>]</span> gave us nearly instantaneous feedback on the final form, allowing us to check that the semantics was present and correct. </p>
<p>The requirements for semantics have, however, to be light. We have concentrated throughout K-Blog on the ease of delivery of content; even with this focus, it is hard. In most cases, asking for more work, for more semantics than authors are used to giving in papers is problematic. For example, I (PWL) attempted to add microformat-based markup to Ontogenesis, again, identifying ontology terms. So far, all article authors have ignored this markup (including, embarrasingly, myself). </p>
<p>One solution to this issue is to ensure that authors themselves benefit directly from extra semantics. For example, the Mathjax-Latex plugin allows WordPressto present mathematics in T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X or MathML markup in the final document, which is more semantically meaningful than the default WordPress behaviour of rendering an image. From the authors perspective, it also enables the use of T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X markup in Word, and the end product scales and looks less ugly on the web page. </p>
<p>With Kcite, we allow the user to embed DOIs or Pubmed IDs; this can be achieved at no cost to the user, if they already use a bibliography tool, as it can transparently produce citations for them using Kcite shortcodes. Development versions of Kcite already allow easy switching of bibliographic style that we hope will become at the option of the author (rather than the website or publisher as is currently the case), and/or the reader. With this additional information, we can also embed more semantics into the end document at no additional cost to the author, using for example the least specific CiTO <tt class="ttfamily">cites</tt> term. However, further use of CiTO that will require the author to decide which term to use, with relatively little gain to themselves, and may require extension to bibliographic tools if we are to maintain transparency of Kcite shortcodes; even if the tools are present, it is unclear whether authors will use them. We note that semantics useful to domain authors is likely to be domain-specific; mathematicians are more likely to care about maths presentation, but less likely to care about Pubmed IDs. We need to be able to extend the publishing model and environment for different journals to cope. </p>
<p>From a technological perspective, we have found the use of shortcodes to be a good mechanism for readers to add semantics. They are simple and relatively easy to understand. In some cases they can be hidden from the user entirely; forcing users to add markup to otherwise WYSIWYG environments such as MS Word is best avoided. Although the direct use of a more standard XML markup would seem more sensible, in practice it requires tool support, as XML markup will be escaped by helpful remote posting tools. Extension of remote posting tools is hard (for tools like MS Word) or impossible (for cloud tools such as Google Docs or LiveWriter). A blogging engine such as WordPress makes it trivial to replace shortcodes both with a <i class="itshape">presentation</i> format and machine interpretable <i class="itshape">microformat</i>; for example, the development version of Kcite transforms DOI short codes (&#91;cite]10.232/43243[/cite]) into in-text citations (Smith et al, (2002)) embedded in a span tag (<tt class="ttfamily">&lt;span kcite-id="10.232/43243"&gt;Smith et al, (2002)&lt;/span&gt;</tt>) that are subsequently transformed into final presentation form within the browser using Javascript. The presentation form can also support additional semantic markup such as CiTO <span class="cite">[<a href="#cito">26</a>]</span>. </p>
<p>Although we believe that additional semantics are a good thing, we will not enforce a requirement for additional semantics on authors. If authors choose not to use kcite, then this is their choice. We need to show that they are useful. Our experience with many (non)standards such as CoINS, DOIs, OAI-ORE, LOCKSS is that they are not simple, speaking primarily to publishers or librarians. For a semantic web approach to work, it must focus on authors and readers, as they produce and consume the content. Extracting even light-weight semantics even from authors who are ontology experts is hard. For other domains, the situation may be worse. </p>
<p>Current publishing practices make use of semantic web technology impractical; semantics added by authors are unlikely to be represented correctly if the end product is a PDF typeset by hand. More over, we can see little point adding semantics to individual articles if this is done in a bespoke way. With K-Blog, we have focused on providing both content, and a full process, with review, using existing tools and workflows, adding semantics secondarily or incidentally where we can. As a result, the level of semantics that we have achieved is light-weight. However, we believe that K-Blog and WordPress combined with associated tooling provides all the basic requirements for a publishing process, and that it provides an attractive framework on which to build a semantic web. </p>
<h1 id="a0000000012">Acknowledgements</h1>
<p>We would like to acknowledge the contribution of the authors of articles for both the Ontogenesis and Taverna K-Blog, whose feedback was essential for this process. K-Blogis currently funded by JISC. </p>
<div>
<h1>Bibliography</h1>
<dl class="bibliography">
<dt>
[<a name="bioinf">1</a>]
</dt>
<dd>
<p>Bioinformatics. <a href="http://bioinformatics.knowledgeblog.org">http://bioinformatics.knowledgeblog.org</a>. </p>
</dd>
<dt>
[<a name="datacite">2</a>]
</dt>
<dd>
<p>Datacite. <a href="http://datacite.org/">http://datacite.org/</a>. </p>
</dd>
<dt>
[<a name="health">3</a>]
</dt>
<dd>
<p>Health and Public Health. <a href="http://health.knowledgeblog.org">http://health.knowledgeblog.org</a>. </p>
</dd>
<dt>
[<a name="nupedia">4</a>]
</dt>
<dd>
<p>Nupedia. <a href="http://en.wikipedia.org/wiki/Nupedia">http://en.wikipedia.org/wiki/Nupedia</a>. </p>
</dd>
<dt>
[<a name="ojs">5</a>]
</dt>
<dd>
<p>Open Journal System. <a href="http://pkp.sfu.ca/?q=ojs">http://pkp.sfu.ca/?q=ojs</a>. </p>
</dd>
<dt>
[<a name="h2g2">6</a>]
</dt>
<dd>
<p>The Guide to Life, the Universe and Everything. <a href="http://www.bbc.co.uk/h2g2/">http://www.bbc.co.uk/h2g2/</a>. </p>
</dd>
<dt>
[<a name="wordpress">7</a>]
</dt>
<dd>
<p>WordPress. <a href="http://www.wordpress.org">http://www.wordpress.org</a>. </p>
</dd>
<dt>
[<a name="wikipedia_expert">8</a>]
</dt>
<dd>
<p>Wikipedia:expert retention, 2008. <a href="http://en.wikipedia.org/wiki/Wikipedia:Expert_retention">http://en.wikipedia.org/wiki/Wikipedia:Expert_retention</a>. </p>
</dd>
<dt>
[<a name="anthologize">9</a>]
</dt>
<dd>
<p>Anthologize, 2010. <a href="http://anthologize.org/">http://anthologize.org/</a>. </p>
</dd>
<dt>
[<a name="csl">10</a>]
</dt>
<dd>
<p>Citation style language, 2010. <a href="http://www.citations-styles.org">http://www.citations-styles.org</a>. </p>
</dd>
<dt>
[<a name="problemdois">11</a>]
</dt>
<dd>
<p>The problem with DOIs, 2011. <a href="http://www.russet.org.uk/blog/2011/02/the-problem-with-dois/">http://www.russet.org.uk/blog/2011/02/the-problem-with-dois/</a>. </p>
</dd>
<dt>
[<a name="tavernakblog">12</a>]
</dt>
<dd>
<p>The Taverna Knowledgeblog, 2011. <a href="http://taverna.knowledgeblog.org">http://taverna.knowledgeblog.org</a>. </p>
</dd>
<dt>
[<a name="arewethere">13</a>]
</dt>
<dd>
<p>Sean Bechhofer. Reflections on blogging a book. Ontogenesis, 2011. <a href="http://ontogenesis.knowledgeblog.org/647">http://ontogenesis.knowledgeblog.org/647</a>. </p>
</dd>
<dt>
[<a name="citeproc-js">14</a>]
</dt>
<dd>
<p>Frank Bennett. Citeproc-js. <a href="https://bitbucket.org/fbennett/citeproc-js/wiki/Home">https://bitbucket.org/fbennett/citeproc-js/wiki/Home</a>. </p>
</dd>
<dt>
[<a name="levels">15</a>]
</dt>
<dd>
<p>Simon Cockell, Dan Swan, and Phillip Lord. Knowledgeblog types and peer-review levels. Process, 2010. <a href="http://process.knowledgeblog.org/archives/19">http://process.knowledgeblog.org/archives/19</a>. </p>
</dd>
<dt>
[<a name="Wikipedia_academics">16</a>]
</dt>
<dd>
<p>Zoe Corbyn. Wikipedia wants more contributions from academics, 2011. <a href="http://www.guardian.co.uk/education/2011/mar/29/wikipedia-survey-academ%
ic-contributions">http://www.guardian.co.uk/education/2011/mar/29/wikipedia-survey-academ%
ic-contributions</a>. </p>
</dd>
<dt>
[<a name="wikipediaage">17</a>]
</dt>
<dd>
<p>Casper Grathwohl. Wikipedia comes of age. The Chronile of Higher Education, 2011. <a href="http://chronicle.com/article/article-content/125899/">http://chronicle.com/article/article-content/125899/</a>. </p>
</dd>
<dt>
[<a name="second">18</a>]
</dt>
<dd>
<p>D. Kell. Metabolomics, food security and blogging a book, 2010. <a href="http://blogs.bbsrc.ac.uk/index.php/2010/01/metabolomics-food-security-b%
logging-book/">http://blogs.bbsrc.ac.uk/index.php/2010/01/metabolomics-food-security-b%
logging-book/</a>. </p>
</dd>
<dt>
[<a name="first">19</a>]
</dt>
<dd>
<p>Jim Logan. What is an ontology? | ontogenesis, 2010. <a href="http://ontogoo.blogspot.com/2010/01/what-is-ontology-ontogenesis.html">http://ontogoo.blogspot.com/2010/01/what-is-ontology-ontogenesis.html</a>. </p>
</dd>
<dt>
[<a name="reality_in_biology_2010">20</a>]
</dt>
<dd>
<p>Phillip Lord and Robert Stevens. Adding a little reality to building ontologies for biology. <em>PLoS One</em>, 2010. </p>
</dd>
<dt>
[<a name="reality_in_biology_2010_blog">21</a>]
</dt>
<dd>
<p>Phillip Lord and Robert Stevens. Adding a little reality to building ontologies for biology, 2010. <a href="http://www.russet.org.uk/blog/2010/07/realism-and-science/">http://www.russet.org.uk/blog/2010/07/realism-and-science/</a>. </p>
</dd>
<dt>
[<a name="onto-mani">22</a>]
</dt>
<dd>
<p>Phillip Lord and Robert Stevens. The Ontogenesis Manifesto, 2010. <a href="http://ontogenesis.knowledgeblog.org/manifesto">http://ontogenesis.knowledgeblog.org/manifesto</a>. </p>
</dd>
<dt>
[<a name="firstyear">23</a>]
</dt>
<dd>
<p>Phillip Lord and Robert Stevens. Ontogenesis: One year one. Ontogenesis, 2011. <a href="http://ontogenesis.knowledgeblog.org/1063">http://ontogenesis.knowledgeblog.org/1063</a>. </p>
</dd>
<dt>
[<a name="taverna">24</a>]
</dt>
<dd>
<p>Tom Oinn, Mark Greenwood, Matthew Addis, M. Nedim Alpdemir, Justin Ferris, Kevin Glover, Carole Goble, Antoon Goderis, Duncan Hull, Darren Marvin, Peter Li, Phillip Lord, Matthew R. Pocock, Martin Senger, Robert Stevens, Anil Wipat, and Chris Wroe. Taverna: lessons in creating a workflow environment for the life sciences: Research articles. <em>Concurr. Comput. : Pract. Exper.</em>, 18:1067–1100, August 2006. </p>
</dd>
<dt>
[<a name="epub_from_wordpress">25</a>]
</dt>
<dd>
<p>Peter Sefton. Making epub from wordpress (and other) web collections, 2011. <a href="http://jiscpub.blogs.edina.ac.uk/2011/05/25/making-epub-from-wordpress-%
and-other-web-collections/">http://jiscpub.blogs.edina.ac.uk/2011/05/25/making-epub-from-wordpress-%
and-other-web-collections/</a>. </p>
</dd>
<dt>
[<a name="cito">26</a>]
</dt>
<dd>
<p>David Shotton. CiTO, the Citation Typing Ontology. <em>Journal of Biomedical Semantics</em>, 1(Suppl 1):S6, 2010. </p>
</dd>
<dt>
[<a name="snomed">27</a>]
</dt>
<dd>
<p>M.Q. Stearns, C. Price, K.A. Spackman, and A.Y. Wang. SNOMED clinical terms: overview of the development process and project status. In <em>AMIA Fall Symposium (AMIA-2001)</em>, pages 662–666. Henley &amp; Belfus, 2001. </p>
</dd>
<dt>
[<a name="go2000">28</a>]
</dt>
<dd>
<p>The Gene Ontology Consortium. Gene Ontology: Tool for the Unification of Biology. <em>Nature Genetics</em>, 25:25–29, 2000. </p>
</dd>
</dl>
</div>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1920 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2011/06/ontogenesis-knowledgeblog-lightweight-semantic-publishing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

