<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>An Exercise in Irrelevance &#187; Science</title>
	<atom:link href="http://www.russet.org.uk/blog/category/all/professional/science/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.russet.org.uk/blog</link>
	<description>Ramblings from Phil Lord&#039;s life</description>
	<lastBuildDate>Thu, 02 Feb 2012 14:11:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Thoughts on a Chimney</title>
		<link>http://www.russet.org.uk/blog/2011/10/thoughts-on-a-chimney/</link>
		<comments>http://www.russet.org.uk/blog/2011/10/thoughts-on-a-chimney/#comments</comments>
		<pubDate>Tue, 18 Oct 2011 12:53:23 +0000</pubDate>
		<dc:creator>Phil Lord</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1943</guid>
		<description><![CDATA[While I am currently spending a significant amount of my time promoting the idea that blog technology can be, and should be used for serious scientific material, I thought I would make a post of a different and perhaps more traditional vein: that is, a light-weight idea, with no serious research behind it, but Years [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1943">
<p><a name="preamble"></a> 
<p>While I am currently spending a significant amount of my time promoting the idea that blog technology can be, and should be used for serious scientific material, I thought I would make a post of a different and perhaps more traditional vein: that is, a light-weight idea, with no serious research behind it, but  Years ago now, I created an <a href="http://homepages.cs.ncl.ac.uk/phillip.lord/wiki/energy/index.html">Energy Wiki</a> full of daft ideas for making energy. I last revisted this in 2009, with an idea for <a href="http://www.russet.org.uk/blog/2009/05/the-sea-cylinder-storage-system/">storing energy at sea</a>. I&#8217;d actually forgotten that part of the reason for this was to try out Inkscape, which is part of the reason for this post. I wanted to try a bit of multi-media, that is, a blog post with an image in it. High tech.</p>
<p>So, the idea. One form of renewable is the <a href="http://en.wikipedia.org/wiki/Solar_updraft_tower">Solar Updraft Tower</a>, also known as a solar chimney. This works straightforwardly enough: you build a large greenhouse in a desert, with a very large chimney in the middle. The top of the chimney is in cold air, the bottom in hot, and an updraft results; stick a turbine in or at the base of the chimney, and you get energy out.</p>
<p>The problem is to work at all efficiently, you need a big temperature differential, so a tall chimney. This in turn means a wide chimney, both to support a substantial updraft, and for mechanical reasons. Tall means 500m or more. The bottom line of this is that a pretty significant capital expenditure is required, followed by a relatively long pay-back period, which in turn means that the biggest single expense of the project is likely to be interest charges, rather than anything else.</p>
<p>So, my idea, is to use an inflatable chimney instead. Initially, I thought about some kind of helium lifting scheme, but then I realised that this makes no sense; why not use hot air, which after all is what the whole system is designed to generate. Consider, for instance, the following organisation:</p>
<p><img src="http://www.russet.org.uk/blog/wp-content/uploads/2011/10/inflatable_solar_chimney.png" style="border-width: 0;" alt="Inflatable Chimney" height="500"></p>
<p>Essentially, it&#8217;s a traditional balloon with a hole in the middle. Obviously the whole system is stackable&#8201;&#8212;&#8201;a second balloon could be placed on top of the first and so on. The whole structure could be assembled or dissassembled as desired. Unfortunately, though this would probably take quite a bit of work.</p>
<p>My second thought came from the idea that, while most designs for solar chimneys have the chimney in the middle of the greenhouse, it doesn&#8217;t really need to be. A horizontal pipe to the middle would be enough. The chimney could be outside of the greenhouse. The advantage that this brings is that the tower could be raised or lowered in-situ, without the risk of it falling on, and damaging the greenhouse. So my second idea was to build the chimney as a two cylinders, with the gap between the serving as the inflatable, buoyant structure. By pleating the cylinders in opposite directions like so:</p>
<p><img src="http://www.russet.org.uk/blog/wp-content/uploads/2011/10/concertina_chimney.png" style="border-width: 0;" alt="Concertina Chimney" height="500"></p>
<p>the whole structure should concertina up and down. By inflating from the top and deflating from the bottom, it should be possible to raise or lower the entire system by opening and shutting vents at the bottom or top of each section to the inside of the chimney.</p>
<p>One advantage with this system, is that as the chimney gets higher, the temperature differential between the inside and the outside gets greater, which should mean that the taller the tower, the more bouyant the sections get; this should help to keep the entire thing as upright as possible, as will the air travelling through the middle, like some gigantic party blower.</p>
<p>Another addition that cames to mind would be to add inflatable half-toroids around the chimney at regular intervals. With a curve on the top, and a flat bottom-side, the entire thing should operate like an aerofoil, lifting the tower up; so, the windier it gets, the greater the lift, which is just what is needed to keep it as upright as possible. This should mean that the chimney can operate in relatively high wind levels.</p>
<p>This kind of system could even work in concert with a fixed chimney&#8201;&#8212;&#8201;extending the height by 500m say, and increasing it&#8217;s efficiency. It could also act as a supplement&#8201;&#8212;&#8201;operating only on very hot days when the greenhouse has excess capacity. Or, finally, it could operate while the main chimney was being built, meaning that a plant can start generating income earlier, which should reduce the cost of interest payments.</p>
<p>Of course, this all comes with drawbacks: the ongoing running costs are likely to be a significant; wind will remain a significant factor regardless; and, finally, inflating the tower will using hot air, which will reduce the efficiency of the whole system. Are these flaws significant? Well, as I said, this post is light-weight with no serious research behind it. I have no idea, nor any really clear idea about how to work out these costs. Answers on a postcard please.</p>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1943 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2011/10/thoughts-on-a-chimney/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Kblog has been compromised</title>
		<link>http://www.russet.org.uk/blog/2011/09/kblog-has-been-compromised/</link>
		<comments>http://www.russet.org.uk/blog/2011/09/kblog-has-been-compromised/#comments</comments>
		<pubDate>Tue, 20 Sep 2011 13:35:07 +0000</pubDate>
		<dc:creator>Phil Lord</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1939</guid>
		<description><![CDATA[I have been pushing the idea of Kblogs&#8201;&#8212;&#8201;scientific publishing using commodity software&#8201;&#8212;&#8201;for a year or so know. Our main site, Knowledgeblog.org has got around 100 articles now, and has had about 50k page views (or about 4x the number of raw page hits) and has generated a certain presence on the internet. While this is [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1939">
<p><a name="preamble"></a> 
<p>I have been pushing the idea of Kblogs&#8201;&#8212;&#8201;scientific publishing using commodity software&#8201;&#8212;&#8201;for a year or so know. Our main site, <a href="http://knowledgeblog.org">Knowledgeblog.org</a> has got around 100 articles now, and has had about 50k page views (or about 4x the number of raw page hits) and has generated a certain presence on the internet. While this is generally good, the price of fame is that we have moved somewhat up the list of potential hack targets. Unfortunately, this has resulted in two compromises on the machine; they were probably not disconnected, although we have no evidence to link the two at the moment.</p>
<p>The first was through the timthumb zero day vulnerability. It involved a code injection into a WordPress installation using a thumb nail generator with a dodgy bit of PhP in it. We cleaned the system up as well as we are able and went from there. Sadly, a couple of days ago, we had a second break in. This was a more serious and directed attack (the timthumb was scripted, and we were one of several thousands of sites to be hit). In this case, the machine has been root compromised, and the web server used to gather username/passwords in a phishing expedition. We do have backups and all of the content. There were a number of things that we could have done to secure the machine further, at least one of which may have prevented the hack, but there are only so many hours in the day.</p>
<p>So, where does this leave us? Is the whole idea of knowledgeblog broken? Personally, I do not think so. While I have been critical of the cost associated with academic publishing, I am aware that it cannot happen for free. Running and maintaining a web server takes money; it is something that we have been doing on a shoe-string for a while, especially since our JISC money ran out. In the couple of years that we have run knowledgeblog, I think that we have learned and shown a lot. As well as page views and content, we have shown that scientific publishing can be easy for the author; that we can generate attractive articles this way; that we can start to embed computational accessible knowledge into these articles. We have shown that we can do peer-review, if we need. We have shown we can <a href="http://wayback.archive.org/web/*/http://knowledgeblog.org">archive</a> and preserve for the future. We have shown that knowledgeblog is good for grey literature. We have added <a href="http://www.russet.org.uk/blog/2011/02/the-problem-with-dois/">DOIs</a>. Multiple authors. Good looking <a href="http://www.russet.org.uk/blog/2010/08/latex-to-wordpress/">maths</a>. We even have some preliminary stats on how much publication costs from Word doc to website.</p>
<p>At the moment, though, we do not have a business model. It is clear that if we are to move this forward, it needs to be run as a service, managed, and looked after, something which is neither my expertise or desire. The analogy that I have made earlier with <a href="http://www.russet.org.uk/blog/2011/06/the-naivete-of-scientists/">Wikipedia</a> is, I think, a good one; it would be good to move this into a foundation status.</p>
<p>The path from here to there is a long one, however. For the moment, we will restore knowledgeblog, and it will re-emerge, although at this time of year, it will take a while. But we look to the future as well.</p>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1939 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2011/09/kblog-has-been-compromised/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Naivete of Scientists</title>
		<link>http://www.russet.org.uk/blog/2011/06/the-naivete-of-scientists/</link>
		<comments>http://www.russet.org.uk/blog/2011/06/the-naivete-of-scientists/#comments</comments>
		<pubDate>Thu, 30 Jun 2011 16:11:37 +0000</pubDate>
		<dc:creator>Phil Lord</dc:creator>
				<category><![CDATA[Science]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1924</guid>
		<description><![CDATA[Although in some disciplines, it is relatively uncontentious, the rise of open access publishing has produced a lot of comment in others. In one of my two disciplines, computing science, this form of publication is still the minority, and still raises comment. For instance, Michel Beaudouin-Lafon has commented suggesting this scientists are highly naive about [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1924">
<p><a name="preamble"></a> 
<p>Although in some disciplines, it is relatively uncontentious, the rise of open access publishing has produced a lot of comment in others. In one of my two disciplines, computing science, this form of publication is still the minority, and still raises comment. For instance, Michel Beaudouin-Lafon has <a href="http://delivery.acm.org/10.1145/1650000/1646367/p32-beaudouin-lafon.html">commented</a> suggesting this scientists are highly naive about the costs of publishing. He argues that scientific publishing is intrinsically expensive, and that open access will have negative implication for science as a whole.</p>
<blockquote><p>Over the years, commercial STM publishing has become a cutthroat business with cutthroat practices and we, the scientific and academic community, are the naive lambs, blinded by the ideals of science for the public good-or simply in need of more publications to advance our careers.</p>
<p align="right"> &#8212; Michel Beaudouin-Lafon </p>
</blockquote>
<p>Personally, I think that &#8220;naive&#8221; is the wrong word; scientists are often not good at operating in a co-ordinated way. Although, we work together in small groups, and sometimes in large groups, in general, we are still very much a cottage industry; at any one time the number of scientists working in a distinct discipline is not that large, even on a world-wide basis. Of course, this works pretty well for scientific advance; we are not a production industry, but researcher. No one knows the best way forward, and we need to experiment to find out. But it does mean that we often play second fiddle to those capable of more co-ordinated action; compare for example, scientists to the medical community with its tightly controlled professional bodies. Or, of course, the STM publishing industry, particularly as it has become focused in fewer and fewer competing publishers.</p>
<blockquote><p>For example, ACM spends several million dollars every year to support the reliable data center serving the Digital Library</p>
<p align="right"> &#8212; Michel Beaudouin-Lafon </p>
</blockquote>
<p>Clearly, it is true that the cost of data centres and storage are not trivial. But the cost of servicing data has plummeted over recent years. Scientific papers largely consist of storing words and figures; these do not take up much space. The laptop I am working on has a copy of my email directory; it&#8217;s not complete but it carries most of my <a href="http://www.russet.org.uk/blog/2007/07/preservation-for-the-future/">outgoing email</a> since 1994 and a lot of the incoming; this is a lot of words! But the total size is now less than 5G, which will fit on a 3 pound pen drive, or my phone. Now if ACM were storing research data, then it would be a totally different issue; the costs here are significant, problematic and rising. But they do not.</p>
<p>The ACM might spend several million dollars a year, but the bottom line here is that this does not account for the cost of publishing. The Wikimedia foundation which supports Wikipedia spends around 10 million dollars a year, in total, on one of the top ten websites in the World. This is about the daily cost of the whole scientific publishing industry.</p>
<blockquote><p>The quality of a journal is typically measured by its impact factor</p>
<p align="right"> &#8212; Michel Beaudouin-Lafon </p>
</blockquote>
<p>And a very bad measurement of journal quality it is too. As someone who works in two disciplines at once, I constantly get hit by this: my best computing publications have laughable impact factors when compared to my bio publications; when judged against computer scientists, however, my bio publications have such high impact factors, that they have to be ignored as outliers.</p>
<blockquote><p>At $5,000 per publication, my lab is broke.</p>
<p align="right"> &#8212; Michel Beaudouin-Lafon </p>
</blockquote>
<p>It is not clear where the $5,000 figure comes from, as most open access is less than this. But, anyway, this argument makes no sense. Our labs are already paying a vast amount of money for publications; usually this is squirrelled away in overheads, taken from our budgets before we see the money. And, although it doesn&#8217;t happen so much in computing, many journals levy significant page charges.</p>
<blockquote><p>They are the big pharmaceutical labs and the tech firms who publish very little but rely on the publication of scientific results for their businesses. With author-pay, research will pay so that industry can get their results for free. Is this moral?</p>
<p align="right"> &#8212; Michel Beaudouin-Lafon </p>
</blockquote>
<p>Open access on its own is not enough. we also need public disclosure about the process. Perhaps the examples of the pharmaceutical <a href="http://www.the-scientist.com/templates/trackable/display/blog.jsp?type=blog&amp;o_url=blog/display/55671&amp;id=55671">funding</a> journals directly are unusual. It is not so easy to tell at the moment. In this context, it could be argued that the last thing we need is the pharmaceutical industry paying for the results of science. Of course, conversely, the pharmaceutical industry could argue that they already do pay for the (publically funded) research by way of taxation.</p>
<p>While they are interesting, all of these arguments really miss the point: the pharmaceutical industry already get their results for free, as their subscription fees do NOT pay for the research just its publication. The publishing industry also get the results that they depend on for free or with page-charges by charging the authors. And for every paper that researchers publish for free, they pay more to read someone elses.</p>
<p>So, we are already in the situation that we are told is not moral.</p>
<blockquote><p>It is important to understand that the scientific community is largely at fault</p>
<p align="right"> &#8212; Michel Beaudouin-Lafon </p>
</blockquote>
<p>There is some truth in the idea that scientific community has let itself walk into the situation, but ultimately I feel, that this is like blaming the financial crises on those recieving subprime mortgages. It is true that it is scientists who submit their best work to expensive closed publishers; but, especially in early and mid &#8220;career&#8221;, we do this to safe-guard our futures.</p>
<blockquote><p>The problem with the subscription model is not the model but the fees.</p>
<p align="right"> &#8212; Michel Beaudouin-Lafon </p>
</blockquote>
<p>Quite the opposite. Ultimately, I don&#8217;t pay the fees, so how much do I really care? But the subscription model prevents re-purposing, it limits access, it prevents competition. I work at a university as a scientist because I value the ability to be able to swap and discuss my work. I want the general public to be able to access my research. Dissemination of knowledge should be part of my job; I think it is reasonable that I, or my employers, should pay for it.</p>
<p>Which is not to say that the level of fees are fine; they are not. They are far to expensive under any model.</p>
<blockquote><p>The added value provided by publishers is twofold: reputation (the value of the imprimatur), and archiving (the guarantee that the work will be available forever).</p>
<p align="right"> &#8212; Michel Beaudouin-Lafon </p>
</blockquote>
<p>And this is it? Is this all that we are getting, given the costs? Especially the the reputation comes from the work, not the journal, and the archiving should be a rapidly decreasing cost.</p>
<p>Actually, in practice, I think the current publishing industry brings more value; selection of reviewers, sometimes copy-editing and, critically, advertising of the content. But, again, times have changed, and publishing practice in these areas has not.</p>
<blockquote><p>The only other area in publishing where authors pay to get published is called the vanity press. Do we really want to enter that model?</p>
<p align="right"> &#8212; Michel Beaudouin-Lafon </p>
</blockquote>
<p>This is a low blow, nor is it true. Many people pay for their own publishing costs. The government pays to publish election results; health service pay to publish public health information; companies pay to publish product safety recalls. All circumstances where the value to the author of public awareness of their content far exceeds the income they would recieve from charging. And the biggest example of this is the advertising industry.</p>
<p>Nor is the implication that this will necessarily result in low quality true. Consider the blogosphere; of course, there is much junk, the standard of science journalism is very high; frankly, when ever respecting sources like the BBC start talking about <a href="http://news.bbc.co.uk/1/hi/health/7354458.stm">pixie dust</a>, it&#8217;s probably at least as high-standard the as mainstream media.</p>
<p>All this aside, what do I, as a scientist, actually care about? Some of these leap to mind:</p>
<ul> 
<li> Stable location and content. </li>
<li> archiving </li>
<li> peer review </li>
<li> discovery and selection </li>
</ul>
<p>Open access was built on the basis of replicating the existing publication. PLoS for example did this precisely so that it did not challenge both the business model and the publication procedure at the same time. How much of the costs stem from this? I think that we, as authors and readers, should know. How much of the millions the ACM spends on it&#8217;s data centre is involved in managing access controls, for example? How much on advertising? How much at booths at meetings?</p>
<p>Open access has opened the door, but now we need to challenge and change the process. Hosting data is not free nor is archiving. And, yet, I can find own my website from <a href="http://web.archive.org/web/20020203022740/http://www.russet.org.uk/">2002</a> and enjoy it&#8217;s gaudy colour scheme all again. If this blog post is so exciting to the world, that the load brings the server down, you will be able to read it on <a href="http://www.russet.org.uk.nyud.net/blog/">coral cache</a>. The peer review <strong>is</strong> expensive and time-consuming; I know because I&#8217;ve organised enough of it for <a href="http://www.bio-ontologies.org.uk">BioOntologies</a>. But then I did not get paid for this and how many of the real costs of peer-review do publishers bear? And discovery and selection? Well, we have google, and I follow my peers on twitter.</p>
<blockquote><p>Author fees are not a solution. [&#8230;] Finally, nonprofit publishers should take advantage of their unique position to experiment with sustainable evolutions of their publishing models.</p>
<p align="right"> &#8212; Michel Beaudouin-Lafon </p>
</blockquote>
<p>And on this, I could not agree more. Our experiment with <a href="http://www.knowledgeblog.org">Knowledgeblog</a> suggests that we can get 90% (or 80% or 70% depending on who you ask) with commodity software. It&#8217;s only a small start, but then I was on the mailing list that saw the first email about the creation of wikipedia, and that wasn&#8217;t long ago.</p>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1924 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2011/06/the-naivete-of-scientists/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Ontogenesis Knowledgeblog: Lightweight Semantic Publishing</title>
		<link>http://www.russet.org.uk/blog/2011/06/ontogenesis-knowledgeblog-lightweight-semantic-publishing/</link>
		<comments>http://www.russet.org.uk/blog/2011/06/ontogenesis-knowledgeblog-lightweight-semantic-publishing/#comments</comments>
		<pubDate>Tue, 07 Jun 2011 14:28:13 +0000</pubDate>
		<dc:creator>Phil Lord</dc:creator>
				<category><![CDATA[Papers]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1920</guid>
		<description><![CDATA[This is a paper we wrote for STLR2011 also published directly on Knowledgeblog Abstract The web has moved from a minority interest tool to one of the most heavily used platforms for publication. Despite originally being designed by and for academics, it has left academic publishing largely untouched; most papers are available on-line, but in [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1920">
<p>This is a paper we wrote for <a href="http://stlr2011.weebly.com/">STLR2011</a>
also published directly on <a href="http://knowledgeblog.org/128">Knowledgeblog</a></p>
<h1>Abstract</h1>
<div class="abstract"> The web has moved from a minority interest tool to one of the most heavily used platforms for publication. Despite originally being designed by and for academics, it has left academic publishing largely untouched; most papers are available on-line, but in PDF and are most easily read once printed. Here, we describe our experiments with using commodity web technology to replace the existing publishing process; the resource describing ontologies that we have developed with this platform; and, finally, the implications that this may have for publishing in a semantic web framework. </div>
<h1 id="a0000000002">Authors</h1>
<p> Phillip Lord Newcastle University Newcastle-upon-Tyne, UK </p>
<p>Simon Cockell Newcastle University Newcastle-upon-Tyne, UK </p>
<p>Daniel C. Swan Newcastle University Newcastle-upon-Tyne, UK </p>
<p>Robert Stevens University of Manchester Manchester, UK </p>
<h1 id="a0000000003">Introduction</h1>
<p>The Web was invented around 1990 as a light-weight mechanism for publication of documents, enabling scientists to share their knowledge, in the form of hypertext documents. Although scientists and later most academics, like the rest of society, have made heavy use of the web, it has not had a significant impact on the academic publication process. While most journals now have websites, the publication process is still based around paper documents or electronic representations of paper documents in the form of a PDF. Most conferences still handle submissions in the same way<a href="#a0000000004" class="footnote"><sup class="footnotemark">1</sup></a>. Books on the web, for example, are often limited to a table of contents. </p>
<p>For the authors (certainly from our personal experience), the process is dissatisfying; book writing is time-consuming, tiring and takes a number of years to come to fruition. If the book has one or a few authors, it tends to reflect only a narrow slice of opinion. Multi-author collected works tend to be even harder work for the editor than writing a book solo. Books do not change frequently; they are therefore out-of-date as soon as they are available. Authors feel a greater pressure for correctness, as they will have to live with the consequences of mistakes for the many years it takes to produce a second edition; most scientists welcome feedback, but being asked to justify something you wish you had not said becomes tiresome, especially if you are waiting to update it. </p>
<p>For the consumer of the material (either a human reader, or a computer), the experience is likewise limited. Books on paper are not searchable, not easy to carry around, are often not cheap to buy and more commonly very expensive to buy. For the computer, the material is hard to understand, or to parse. Even distinguishing basic structure (where do chapters start, who is the author, where is the legend for a given figure) is challenging. </p>
<p>All of this points to a need to exploit the Web for scientists to publish in a different way than simply replicating the old publishing process. Here, we describe our experiment with a new (to academia!) form of publishing: we have used widely-available and heavily used commodity software (WordPress <span class="cite">[<a href="#wordpress">7</a>]</span>), running on low-end hardware, to develop a multi-author resource describing the use of ontologies in the life sciences (our main field of expertise). From this experience, we have built on and enhanced the basic platform to improve the author experience of publishing in this manner. We are now extending the platform further to enable the addition of light-weight semantics by authors to their own papers, without requiring authors to directly use semantic web technologies, and within their own tool environment. In short, we believe that this platform provides a ‘cheap and cheerful’ framework for semantic publishing. </p>
<h1 id="a0000000005">The requirements</h1>
<p>The initial motivation for this work came from our experience within the bio-ontology community3. Biomedicine is one of the largest domains for use of ontology technology, producing large and complex ontologies such as the Gene Ontology <span class="cite">[<a href="#go2000">28</a>]</span> or SNOMED <span class="cite">[<a href="#snomed">27</a>]</span>. </p>
<p>As an ontologist, one of the most common questions that one has is: ‘where is there a book or a tutorial that I can read which describes how to build an ontology?’. Currently, there is some tutorial information on the web, there are some books; but there is not a clear answer to the question. Many of the books are collections of research-level papers, or are technologically biased. Currently many ontologists have learned their craft through years reading mailing lists, gathering information from the web and by word of mouth. We wished to develop a resource with short and succinct articles, published in a timely manner and freely available. </p>
<p>We wished, also, however to retain the core of academic publishing. This was for reasons both pragmatic, principled and political. Consider, for example, Wikipedia, that could otherwise serve as a model. Our own experience suggests that referencing Wikipedia can be dangerous: it can and does change over time meaning critical or supportive comments in other articles can be ‘orphaned’. Wikipedia maintains a ‘neutral point-of-view’ which, many are of the opinion, makes it less suitable for areas where knowledge is uncertain and disagreement frequent. Finally, Wikipedia is relatively anonymous in terms of authorship: whether this affects the quality of articles has been a topic of debate <span class="cite">[<a href="#wikipediaage">17</a>]</span>, but was not our primary concern; pragmatically, the promotion and career structure<a href="#a0000000006" class="footnote"><sup class="footnotemark">2</sup></a> for most academics requires a form of professional narcissism; they cannot afford to contribute to a resource for which they cannot claim credit. Of course, our experiences may not be reflective of the body academic overall; there has, for example, been substantial discussion of the issues of expertise on Wikipedia itself <span class="cite">[<a href="#wikipedia_expert">8</a>]</span>. Although the reasons may not be clear, it is clear that academics largely do not contribute to Wikipedia, and that Wikimedia sees this as an issue <span class="cite">[<a href="#Wikipedia_academics">16</a>]</span>. </p>
<p>We also had an explicit set of non-functional requirements. We needed the resource to be easy to administer and low-cost, as this mirrored our resource availability; authors should be offered an easy-to-use publishing environment with minimal ‘setup’ costs, or they would be unlikely to contribute; readers should see a simple, but reasonably attractive and navigable website, or they would be unlikely to read. </p>
<h1 id="a0000000007">The Ontogenesis experience</h1>
<p>Our previous experience with the use of blog software within academia was limited to ‘traditional’ blogging: short pieces about either: the process of science (reports about conferences, or papers for example); journalistic articles about other peoples research; or, personal blogging, that is articles by people who just happen to be academics. Although we wished to develop different, more formal content, this experience suggests that many academics find blogging software convenient, straight-forward enough and useful. </p>
<p>To test this, we decided to hold a small workshop of 17 domain experts over a two day period, and task them with generating content, conduct peer-review of this content and publish it as articles on a blog. </p>
<h2 id="a0000000008">Terminology and the Process</h2>
<p>Like many communities, the blogosphere has developed its own and sometimes confusing terminology. To describe the process we adopted we first describe some of this terminology. A <i class="itshape">blog</i> is a collection of web pages, usually with a common theme. These web pages can be divided into: <i class="itshape">posts</i> that are published (or <i class="itshape">posted</i>) on an explicit date and then unchanged; and <i class="itshape">pages</i> that are not dated and can change. Posts and pages have <i class="itshape">permalinks</i>: although they may be accessible via several URLs, they have one permalink that is stable and never changes. Posts and pages can be <i class="itshape">categorised</i> – grouped under a predefined hierarchy – or <i class="itshape">tagged</i> – grouped using <em>ad hoc</em> words or phrases defined at the point of use. A blog is usually hosted with a <i class="itshape">blog engine</i>, such as <i class="itshape">WordPress</i> that stores content in a database, combines it with style instructions in <i class="itshape">themes</i> to generate the pages and posts. Most blog engines support extensions to their core functionality with <i class="itshape">plugins</i>. Most blogs also support <i class="itshape">comments</i> or short pieces of content added to a post or page by people other than the original authors. Most blog engines also support <i class="itshape">trackbacks</i> which are bidirectional links: normally, a snippet from a linking post will appear as a comment in the linked to post. Trackbacks work both within a single blog and between different distributed blogs. Many blogs support <i class="itshape">remote posting</i>: as well as using a web form for adding new content, users can also post from third party applications, through a programmatic interface using a protocol such as XML-RPC or even by email. Posts and pages are ultimately written in headless HTML (that part of HTML which appears inside the <tt class="ttfamily">body</tt> element), although the different editing environments can hide this fact from the user. </p>
<p>Our initial process was designed to replicate the normal peer-review process, with a single adjustment, that peer-review was open and not blind: papers would be world-visible once submitted; the identities of reviewers would be known to authors; all reviews would be public. We adopted this approach for pragmatic reasons. WordPress has little support for authenticated viewing and none for anonymisation. The full process was as follows: </p>
<ul class="itemize">
<li>
<p>Authors write their content and publish using which ever tooling they find appropriate. </p>
</li>
<li>
<p>The author posts their content, categorising it as <i class="itshape">under review</i>. </p>
</li>
<li>
<p>An editor assigns two reviewers. </p>
</li>
<li>
<p>Reviewers publish reviews as posts or comments. Reviews link to articles, resulting in a trackback from article to review. </p>
</li>
<li>
<p>The author modifies the post to address reviews. </p>
</li>
<li>
<p>Once done to the editors satisfaction, the post is recategorised as <i class="itshape">reviewed</i>. </p>
</li>
</ul>
<p>Our expectation was that following this process, articles would not be changed or updated; this is in stark contrast to common usage for wiki-based websites. New articles could, however, be written updating, extending or refuting old ones. </p>
<h2 id="a0000000009">Reflections on the Ontogenesis K-Blog</h2>
<p>Our initial meeting functioned to ‘bootstrap’ the Ontogenesis K-Blog. This was useful to acquire a critical mass of content, but also, on this first outing, to explore the K-Blogprocess and technology. The setup for the day was the vannilla WordPressinstallation. The day started with a short presentation on the K-Blogmanifesto <span class="cite">[<a href="#onto-mani">22</a>]</span> and an overview of the process, including authoring and reviewing. The guidelines to authors were to write short articles on an ontology subject (a list of suggestions was offered and authors also made their own choices) and to produce the article in whatever manner they felt appropriate. There was a certain level of uncertainty among authors as to the K-Blogprocess (partly because one of the objectives of the meeting was to ‘force out’ the process) and this, naturally, pointed to the need to document the K-Blogprocess so that authors could have the typical ‘instructions to authors’. </p>
<p>This first meeting produced a set of 20 completed and partially completed articles. Some even had reviews. Even on the day itself there was some external interest seen from Twitter. The first external blog post (outside of those produced by attendees) happened during the meeting <span class="cite">[<a href="#first">19</a>]</span> with a second shortly after <span class="cite">[<a href="#second">18</a>]</span>. </p>
<p>We also held a second content provision meeting and together these generated a collection of articles that felt like an academic book in terms of content, but generated with considerably less effort. This experience was also sufficient to gather requirements on how to improve the K-Blogidea. A useful K-Blogon the K-Blogprocess itself was produced by Sean Bechhofer <span class="cite">[<a href="#arewethere">13</a>]</span>. There is also a K-Bloglooking back on the first year of the Ontogenesis K-Blog <span class="cite">[<a href="#firstyear">23</a>]</span>. </p>
<p>Several requirements emerged with respect to <b class="bfseries">authorship</b>. The principle of the short, more or less self-contained article was attractive (though the audience were somewhat self-selecting). Authoring directly in the editor provided by WordPress was felt to be poor by those that tried it. Authoring in a favourite editing tool and then publishing via WordPress worked reasonably well for most authors. There were, however, a variety of issues with the mechanism of this style of publishing; referring to articles that will be, but have not yet, been written. To some extent this was an artefact of the day (many articles being written simultaneously), but authors needed to refer to glossaries and articles in progress. </p>
<p>One stylistic issue was the habit of putting full affiliations at the top of an article. The ontogenesis theme presents the first few lines when displaying many articles, but in many cases this was simply showing the title and author affiliation; where it would be more useful to have the first sentence or so of the article itself. </p>
<p>For the whole K-Blog, a table of contents was felt to be important. This would give an overview of contents and a simple place for navigation about the K-Blog. This raised the issue of <b class="bfseries">attribution</b>; the table of contents needed to expose the authors, including multiple, ordered authors. This is not an unsurprising need, as the authors’ scientific reputation is involved. In this vein, making K-Blogarticles citable by issuing of Digital Object Identifiers (DOI) was requested. </p>
<p>For scientific credibility, the ability to handle <b class="bfseries">citations</b> easily was an obvious requirement. Natively, WordPresshas little or no support for styling citations and references. The ability to cite via DOI and, in this field, PubMed identifiers to automatically make links and produce a reference list was felt to be important. Also, having the Ontogenesis K-Blogarticles in PubMed would also be attractive to authors. </p>
<p>The last <b class="bfseries">authorship</b> issue was the <b class="bfseries">mutability</b> of articles. One aim of K-Blogis to enable articles to change in the light of experience and scientific development, as well as a procedural requirement for updates following review. There was felt to be a conflicting need for articles not to change, so that comments and links from other documents work in the longer term. </p>
<p>The last significant issue was the <b class="bfseries">reviewing</b> of articles. The aim was to have this managed by authors choosing reviewers (with editorial oversight). On the Ontogenesis K-Blogday this could work with authors calling across the room for a review. This is, however, not a sustainable approach. WordPress, however, lacks tracking facilities to manage the reviewing process, whether this is done by an author or an editor. The realisation that such management support is needed is not the greatest insight ever gained, but the requirement is there even in a light weight publishing mechanism. </p>
<h1 id="a0000000010">Improvements to the technology</h1>
<p>Our initial experiment with the ontogenesis K-Blogsuggested a significant number of issues with the use of WordPressfor scientific publication. In this section, we describe the extensions that we have made or used to the publication process, documentation or to WordPressitself. Following our initial experience with Ontogenesis, we have started to trial these improvements, including through another workshop which resulted in a new K-Blog <span class="cite">[<a href="#tavernakblog">12</a>]</span>, describing the scientific workflow engine Taverna <span class="cite">[<a href="#taverna">24</a>]</span>; work is also in progress on the use of a K-Blogfor bioinformatics <span class="cite">[<a href="#bioinf">1</a>]</span>, and another for public healthcare <span class="cite">[<a href="#health">3</a>]</span>. </p>
<p>Currently, we have 11 plugins extending the basic WordPressenvironment. For completeness, all of these are shown in Table <a href="#tab:plugins">1</a>. Our theme is also extended in some places to support the plugins. In general, the plugins are orthogonal and will work independently of each other. One advantage of using WordPressis that many of these plugins are freely available, written and maintained by other authors; while other academic publication environments, such as the Open Journal System <span class="cite">[<a href="#ojs">5</a>]</span> exist and are relatively widely-used, but WordPress is used to host perhaps 10% of the web, making the plugin ecosystem extremely fertile. </p>
<div id="tab:plugins" class="table">
<p><small class="small"><center>
<table cellspacing="0" class="tabular">
<tr>
<td style="border-top-style:solid; text-align:left; border-top-color:black; border-top-width:1px; border-right:1px solid black">
<p> Plugin </p>
</td>
<td style="border-top-style:solid; text-align:left; border-top-color:black; border-top-width:1px; border-right:1px solid black">
<p> Use </p>
</td>
<td style="border-top-style:solid; border-top-color:black; border-top-width:1px; text-align:left">
<p> URL</p>
</td>
</tr>
<tr>
<td style="border-top-style:solid; text-align:left; border-top-color:black; border-top-width:1px; border-right:1px solid black">
<p>Co-Authors Plus </p>
</td>
<td style="border-top-style:solid; text-align:left; border-top-color:black; border-top-width:1px; border-right:1px solid black">
<p> Allows K-Blog posts to have more than one author </p>
</td>
<td style="border-top-style:solid; border-top-color:black; border-top-width:1px; text-align:left">
<p> <a href="http://wordpress.org/extend/plugins/co-authors-plus/">http://wordpress.org/extend/plugins/co-authors-plus/</a></p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>COinS Metadata Exposer †</p>
</td>
<td style="text-align:left; border-right:1px solid black">
<p> Provides COinS metadata on K-Blog posts (used by Zotero, Mendeley etc) </p>
</td>
<td style="text-align:left">
<p> <a href="http://code.google.com/p/knowledgeblog/">http://code.google.com/p/knowledgeblog/</a></p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>Edit Flow </p>
</td>
<td style="text-align:left; border-right:1px solid black">
<p> Gives editorial process management infrastructure </p>
</td>
<td style="text-align:left">
<p> <a href="http://editflow.org/">http://editflow.org/</a></p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>ePub Export </p>
</td>
<td style="text-align:left; border-right:1px solid black">
<p> Exports K-Blog posts as ePub documents </p>
</td>
<td style="text-align:left">
<p> <a href="http://wordpress.org/extend/plugins/epub-export/">http://wordpress.org/extend/plugins/epub-export/</a></p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>KCite \(\ast \) </p>
</td>
<td style="text-align:left; border-right:1px solid black">
<p> Automatic processing of DOIs and PMIDs into in-text citations and bibliographies </p>
</td>
<td style="text-align:left">
<p> <a href="http://knowledgeblog.org/kcite-plugin">http://knowledgeblog.org/kcite-plugin</a></p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>Knowledgeblog Post Metadata Plugin \(\ast \) </p>
</td>
<td style="text-align:left; border-right:1px solid black">
<p> Exposes generic metadata in post headers </p>
</td>
<td style="text-align:left">
<p> <a href="http://code.google.com/p/knowledgeblog/">http://code.google.com/p/knowledgeblog/</a></p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>Knowledgeblog Table of Contents \(\ast \) </p>
</td>
<td style="text-align:left; border-right:1px solid black">
<p> Produces a table of contents based on a category of articles. Posts are listed with all authors </p>
</td>
<td style="text-align:left">
<p> <a href="http://knowledgeblog.org/knowledgeblog-table-of-contents-plugin">http://knowledgeblog.org/knowledgeblog-table-of-contents-plugin</a></p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>Mathjax L<sup style="font-variant:small-caps; margin-left:-0.3em">a</sup>T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X\(\ast \) </p>
</td>
<td style="text-align:left; border-right:1px solid black">
<p> Enables use of T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>Xor MathML in posts, rendered in scalable web fonts </p>
</td>
<td style="text-align:left">
<p> <a href="http://knowledgeblog.org/mathjax-latex-wordpress-plugin">http://knowledgeblog.org/mathjax-latex-wordpress-plugin</a></p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>Post Revision Display </p>
</td>
<td style="text-align:left; border-right:1px solid black">
<p> Publicly exposes all revisions of an article after publication </p>
</td>
<td style="text-align:left">
<p> <a href="http://wordpress.org/extend/plugins/post-revision-display/">http://wordpress.org/extend/plugins/post-revision-display/</a></p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>SyntaxHighlighter Evolved </p>
</td>
<td style="text-align:left; border-right:1px solid black">
<p> Syntax Highlights source code embedded in posts </p>
</td>
<td style="text-align:left">
<p> <a href="http://wordpress.org/extend/plugins/syntaxhighlighter/">http://wordpress.org/extend/plugins/syntaxhighlighter/</a></p>
</td>
</tr>
<tr>
<td style="border-bottom-color:black; border-bottom-width:1px; text-align:left; border-bottom-style:solid; border-right:1px solid black">
<p>WP Post to PDF </p>
</td>
<td style="border-bottom-color:black; border-bottom-width:1px; text-align:left; border-bottom-style:solid; border-right:1px solid black">
<p> Allows visitors to download posts in PDF format </p>
</td>
<td style="border-bottom-width:1px; border-bottom-color:black; border-bottom-style:solid; text-align:left">
<p> <a href="http://wordpress.org/extend/plugins/wp-post-to-pdf/">http://wordpress.org/extend/plugins/wp-post-to-pdf/</a></p>
</td>
</tr>
</table>
<div class="caption"><b>Table 1</b>: <span>WordPress plugins employed by K-Blog. Plugins marked with \(\ast \) are written by the authors. Plugins marked with \(\dag \) are modified by the authors. </span></div>
<p>  </center></small></p>
</div>
<p><b class="bfseries">Reviewing:</b> The initial process was self-managed and required two reviews per article; this was found to be cumbersome. We have addressed this in two ways; first, we have defined a number of different peer-review levels (public review, author review, editorial review <span class="cite">[<a href="#levels">15</a>]</span>), including a light-weight process now being used for Ontogenesis; authors now select their own reviewers, and decide for themselves when articles are complete. Second, we have added software support. Initially, we attempted to use RequestTracker – an open source ticket system, but found the user interface too complex for this purpose. We are now using the EditFlow plugin to WordPress that was designed for managing a review process—albeit a hierarchical rather than peer-review process. </p>
<p><b class="bfseries">Authoring Environment:</b> The standard WordPresseditor was found impractical by most authors, even for short articles. WordPressdoes provide ‘paste from word’ functionality, but this removes all formatting which defeats the point. While the lack of a good editing environment could have been a significant problem, our subsequent experimentation has shown that it is possible to post directly from a wide variety of tools, including ‘office’ tools such as Word, Google Docs, LiveWriter and OpenOffice. This is in addition to a variety of blog-specific tools and text formats (such as asciidoc), which are suitable for some users. We have added documentation to a kblog (<a href="http://process.knowledgeblog.org">http://process.knowledgeblog.org</a>) to address these. In practice, only L<sup style="font-variant:small-caps; margin-left:-0.3em">a</sup>T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X proved problematic having no specific support. To address this, we have produced a tool called <b class="bfseries">latextowordpress</b>; this is an adaptation of the plasT<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X tool, a python based T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X processor, to produce simplified HTML appropriate for WordPresspublishing. Our experience with using the tools is that while none are perfect, sometimes requiring ‘tweaking’ of HTML in WordPress, most reduce publishing time to seconds. </p>
<p><b class="bfseries">Citations:</b> We have addressed the lack of support for citations within WordPresswith a plugin called <b class="bfseries">kcite</b>. This allows authors to add citations into documents as <tt class="ttfamily">shortcodes</tt> with either a DOI or Pubmed ID (other identifiers can and are being added to kcite). Shortcodes are a commonly used form of markup of the form: &#91;tag att=&#8221;att&#8221;]text[/tag]; they are often found where a simplified HTML-like markup is desired. A bibliography is then generated automatically on the web server. Requiring authors to add markup to otherwise WYSIWYG tools is damaging to the user experience. We believe that this is soluable, however, by extending bibliographic tools, by developing a ‘kcite’ style-file or template; we have a prototype of this (using CSL <span class="cite">[<a href="#csl">10</a>]</span>) for Zotero and Mendeley, and another for asciidoc with bibtex. It is also possible to just use native tool support in Word or L<sup style="font-variant:small-caps; margin-left:-0.3em">a</sup>T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X, and convert bibliographies to HTML; the disadvantage with this approach is discussed later. </p>
<p><b class="bfseries">Archiving and Searching:</b> Archiving is primarly a social, rather than technological, problem. A blog engine is fully capable of storing content in the long-term, but authors and readers have to believe that it will do so. As a novel form of academic publishing, K-Blogis not automatically archived by as a scientific journal. However, we have taken advantage of its web publication; the main K-Blogsite is now explicitly archived by the UK Web Archive, as well as implicitly by other web archives. We have enhanced the website with an ‘easy crawl’ plugin–that is a single web page pointing to add articles classified as reviewed. We now support the (technical) requirements for LOCKSS and Pubmed. Simultaneously, this also enhances the searchability of K-Blog, fulfilling the requirements for Google scholar. </p>
<p><b class="bfseries">Non-repudiability:</b> The K-Blogprocess does not allow authors to make semantically meaningful changes after an article has been reviewed. Unfortunately, it is hard to define ‘semantically meaningful’ computationally, so we have made no attempt to address this by locking articles; rather, all versions of articles are now accessible to the reader (WordPressprovides this facility to the authors by default). This enables community enforcement of a no-change policy. </p>
<p><b class="bfseries">Multiple Authors:</b> We believe that authoring is best done outside WordPress. This also means that we do not support multiple-authorship; we have made no attempt to add collaborative features to WordPress. However, we did need articles to carry a byline attributing the articles to multiple authors; although not critical to the functioning of a K-Blog, it is socially critical to appease the professional narcissism (see Section <a></a>) of scientists. Fortunately, this is a common requirement, and a suitable WordPressplugin existed. </p>
<p><b class="bfseries">Identifiers:</b> WordPress already supports permalinks; although we believe that URLs are entirely fit for purpose technologically while DOIs do little other than introduce complexity <span class="cite">[<a href="#problemdois">11</a>]</span>, K-Blogrequired DOIs for professional narcissism. We considered becoming an DOI authority, but this proved impractical. Instead, we have used DataCite <span class="cite">[<a href="#datacite">2</a>]</span>. This has required a small extension to WordPress to extract appropriate metadata and to store the DOIs once minted. </p>
<p><b class="bfseries">Metadata:</b> K-Blognow uncovers various parts of its metadata in a number of ways; unfortunately, there appear to be a large number of (non-)standards in use, each with its own application. K-Blogcurrently provides: COiNS, enabling integration with Zotero and Mendeley; meta tags for Google Scholar; and Dublin Core tags for no specific reason than completeness. We are in the process of providing bibtex export (for bibtex!), and a JSON representation to support citeproc-js <span class="cite">[<a href="#citeproc-js">14</a>]</span> in the second generation of kcite. </p>
<p><b class="bfseries">Mathematics and Presentation:</b> We have also provided several pieces of technology that did not stem from concrete requirements arising from the initial Ontogenesis meeting. We have improved parts of the presentation system by adding, for example, syntax highlighting to code blocks. Additionally, we have created the <b class="bfseries">mathjax-latex</b> plugin enabling the use of T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X(or MathML) markup in posts that are then rendered in the browser using scalable fonts. WordPresshas native math-mode T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X support, but using image fonts which do not scale and have an ugly pixelated display. </p>
<h1 id="a0000000011">Discussion</h1>
<p>We have been motivated by a lack of enthusiasm for traditional book publishing to devise another mechanism by which we can achieve the same ends. We wished to avoid the downsides of an ‘all or nothing’ approach to creating a ‘static’ paper document that is read by relatively few people due to price. The K-Blogapproach allows authors to publish in a piecemeal fashion; writing only that which they are motivated to write using a mechanism that avoids a third party making arbitrary decisions on formatting with peculiar time-scales. </p>
<p>To avoid all this, the K-Blogis a light-weight publishing process based on commodity blogging software. We have taken an approach of writing short articles around a theme of ‘ontology in biology’; the Ontogenesis K-Blog. At the time of writing we have 26 articles and page viewing numbers that are pleasing (see Figure <a href="#fig:views">1</a>). These statistics are generated by WordPressdirectly, and represent (an approximation of) ‘real’ page reads, with robot and self-viewing removed. This is confirmed by the ten most read articles (Table <a href="#sec:acknowledgements">2</a>) that reflect our expectations – ‘What is an ontology’ being first. In this sense, we consider the K-Blogprocess to be a success, especially when considered against the circulation of an equivalent book. </p>
<div id="fig:views" class="figure"><center><img src="http://knowledgeblog.org/files/2011/06/stats-line.png" /> 
<div class="caption"><b>Figure 1</b>: <span>Month page view statistics for the Ontogenesis K-Blog.</span></div>
<p>  </center></div>
<div id="sec:acknowledgements" class="table"><center>
<table cellspacing="0" class="tabular">
<tr>
<td style="border-top-style:solid; text-align:left; border-top-color:black; border-top-width:1px; border-right:1px solid black">
<p> What is an ontology? </p>
</td>
<td style="border-top-style:solid; border-top-color:black; border-top-width:1px; text-align:left">
<p> 1,737</p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>OWL Syntaxes </p>
</td>
<td style="text-align:left">
<p> 1,246</p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>Ontology Learning </p>
</td>
<td style="text-align:left">
<p> 882</p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>Table of Contents </p>
</td>
<td style="text-align:left">
<p> 740</p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>What is an upper level ontology? </p>
</td>
<td style="text-align:left">
<p> 684</p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>Reference and Application Ontologies </p>
</td>
<td style="text-align:left">
<p> 630</p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>Protege &amp; Protege-OWL </p>
</td>
<td style="text-align:left">
<p> 522</p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>Semantic Integration in the Life Sciences </p>
</td>
<td style="text-align:left">
<p> 517</p>
</td>
</tr>
<tr>
<td style="text-align:left; border-right:1px solid black">
<p>Automatic maintenance of multiple inheritance ontologies </p>
</td>
<td style="text-align:left">
<p> 469</p>
</td>
</tr>
<tr>
<td style="border-bottom-color:black; border-bottom-width:1px; text-align:left; border-bottom-style:solid; border-right:1px solid black">
<p>Ontologies for Sharing, Ontologies for Use </p>
</td>
<td style="border-bottom-width:1px; border-bottom-color:black; border-bottom-style:solid; text-align:left">
<p> 330</p>
</td>
</tr>
</table>
<div class="caption"><b>Table 2</b>: <span>Most Viewed articles for the Ontogenesis K-Blog(Totals).</span></div>
<p>  </center></div>
<p>The social processes with K-Blogare largely similar to traditional publishing, with one exception – reviewing is public. While we may have been interested in experimenting with this for principled reasons, in practice we adopted it because we did not know how to support blind anonymous review with WordPress. Open review is not a new idea: Request For Comments are common in standards processes; both Nupedia <span class="cite">[<a href="#nupedia">4</a>]</span> (the fore-runner of Wikipedia) and H2G2 <span class="cite">[<a href="#h2g2">6</a>]</span> (which predates Nupedia) use public peer-review. It is still, however, unusual in academia. In our experience from Ontogenesis, it raised no worries from among our contributors, except that reviewers often wanted to be more involved in the proofing, a role normally played by authors low down the author list; open review processes blurs these lines somewhat. </p>
<p>One open area for the discussion is the extent to which authors can, should be and wish to change articles after publication. While the ability to update is inherent in the web, the desire for non-repudiability was considered to be important; the contradiction here appears fundamental, and we do not feel we have reached a good compromise yet. In one sense, our use of the post-revision display plugin solves this problem; even if the article changes, it is still possible to refer to a specific version. However, like all automated versioning tools, many versions get recorded often with very fine-grained changes, which makes selection of the ‘right’ version hard to impossible. We could replace this with an explicit versioning tool, similar to a source code versioning system; but these systems are hard-to-use for those unused to them, as well as being difficult to implement well. An environment like K-Blog, however, does allow rapid publication of and bi-directional linking with articles; combined with typed linking with CiTO, the ability to publish erratum, addendum and second editions may be a better solution. </p>
<p>Our experiences with K-Blog, we think, are useful in understanding how semantic web technology can and will impact on the publication and library process. Both from our initial work with Ontogenesis, and subsequent work with <a href="http://taverna.knowledgeblog.org">http://taverna.knowledgeblog.org</a>, it has become obvious that good tool support is critical. ‘Good’ in this sense can be straight-forwardly interpreted as ‘familiar’ that in general can be interpreted as MS Word. Our choice of a blogging engine here was (unexpectedly) well-advised, as this form of publication is already supported by many tools. It is also clear that there are many other tools that could be added; while Ontogenesis has the content, for example, that might be found in an academic book, it does not currently have the presentation of the book. Articles are already available as ePUB, and more recent work has used our Table of Contents plugin to provide a single site-wide ePUB of all articles <span class="cite">[<a href="#epub_from_wordpress">25</a>]</span>. Pre-existing tools such as Anthologize <span class="cite">[<a href="#anthologize">9</a>]</span> may also be useful for adding organised collections of articles gathered from the whole. </p>
<p>This has a direct implication on the addition of further semantics to content. On the positive side, the use of WordPress makes semantic additions plausible in a way that many conventional publishing processes do not. For example, the publication of our (PWL, RS) recent paper <span class="cite">[<a href="#reality_in_biology_2010">20</a>]</span> required conversion from the LaTeX source to PDF (by latex), to another PDF, to a MS Word file (by hand), to XML before arriving at the final HTML form. This process took many weeks, required multiple interactions between the authors and publisher. It still failed to preserve the semantic use (to humans) of Courier font highlighting in-text ontology terms and requiring post-publication correction. The equivalent blog post <span class="cite">[<a href="#reality_in_biology_2010_blog">21</a>]</span> gave us nearly instantaneous feedback on the final form, allowing us to check that the semantics was present and correct. </p>
<p>The requirements for semantics have, however, to be light. We have concentrated throughout K-Blog on the ease of delivery of content; even with this focus, it is hard. In most cases, asking for more work, for more semantics than authors are used to giving in papers is problematic. For example, I (PWL) attempted to add microformat-based markup to Ontogenesis, again, identifying ontology terms. So far, all article authors have ignored this markup (including, embarrasingly, myself). </p>
<p>One solution to this issue is to ensure that authors themselves benefit directly from extra semantics. For example, the Mathjax-Latex plugin allows WordPressto present mathematics in T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X or MathML markup in the final document, which is more semantically meaningful than the default WordPress behaviour of rendering an image. From the authors perspective, it also enables the use of T<sub style="text-transform:uppercase; margin-left:-0.2em">e</sub>X markup in Word, and the end product scales and looks less ugly on the web page. </p>
<p>With Kcite, we allow the user to embed DOIs or Pubmed IDs; this can be achieved at no cost to the user, if they already use a bibliography tool, as it can transparently produce citations for them using Kcite shortcodes. Development versions of Kcite already allow easy switching of bibliographic style that we hope will become at the option of the author (rather than the website or publisher as is currently the case), and/or the reader. With this additional information, we can also embed more semantics into the end document at no additional cost to the author, using for example the least specific CiTO <tt class="ttfamily">cites</tt> term. However, further use of CiTO that will require the author to decide which term to use, with relatively little gain to themselves, and may require extension to bibliographic tools if we are to maintain transparency of Kcite shortcodes; even if the tools are present, it is unclear whether authors will use them. We note that semantics useful to domain authors is likely to be domain-specific; mathematicians are more likely to care about maths presentation, but less likely to care about Pubmed IDs. We need to be able to extend the publishing model and environment for different journals to cope. </p>
<p>From a technological perspective, we have found the use of shortcodes to be a good mechanism for readers to add semantics. They are simple and relatively easy to understand. In some cases they can be hidden from the user entirely; forcing users to add markup to otherwise WYSIWYG environments such as MS Word is best avoided. Although the direct use of a more standard XML markup would seem more sensible, in practice it requires tool support, as XML markup will be escaped by helpful remote posting tools. Extension of remote posting tools is hard (for tools like MS Word) or impossible (for cloud tools such as Google Docs or LiveWriter). A blogging engine such as WordPress makes it trivial to replace shortcodes both with a <i class="itshape">presentation</i> format and machine interpretable <i class="itshape">microformat</i>; for example, the development version of Kcite transforms DOI short codes (&#91;cite]10.232/43243[/cite]) into in-text citations (Smith et al, (2002)) embedded in a span tag (<tt class="ttfamily">&lt;span kcite-id="10.232/43243"&gt;Smith et al, (2002)&lt;/span&gt;</tt>) that are subsequently transformed into final presentation form within the browser using Javascript. The presentation form can also support additional semantic markup such as CiTO <span class="cite">[<a href="#cito">26</a>]</span>. </p>
<p>Although we believe that additional semantics are a good thing, we will not enforce a requirement for additional semantics on authors. If authors choose not to use kcite, then this is their choice. We need to show that they are useful. Our experience with many (non)standards such as CoINS, DOIs, OAI-ORE, LOCKSS is that they are not simple, speaking primarily to publishers or librarians. For a semantic web approach to work, it must focus on authors and readers, as they produce and consume the content. Extracting even light-weight semantics even from authors who are ontology experts is hard. For other domains, the situation may be worse. </p>
<p>Current publishing practices make use of semantic web technology impractical; semantics added by authors are unlikely to be represented correctly if the end product is a PDF typeset by hand. More over, we can see little point adding semantics to individual articles if this is done in a bespoke way. With K-Blog, we have focused on providing both content, and a full process, with review, using existing tools and workflows, adding semantics secondarily or incidentally where we can. As a result, the level of semantics that we have achieved is light-weight. However, we believe that K-Blog and WordPress combined with associated tooling provides all the basic requirements for a publishing process, and that it provides an attractive framework on which to build a semantic web. </p>
<h1 id="a0000000012">Acknowledgements</h1>
<p>We would like to acknowledge the contribution of the authors of articles for both the Ontogenesis and Taverna K-Blog, whose feedback was essential for this process. K-Blogis currently funded by JISC. </p>
<div>
<h1>Bibliography</h1>
<dl class="bibliography">
<dt>
[<a name="bioinf">1</a>]
</dt>
<dd>
<p>Bioinformatics. <a href="http://bioinformatics.knowledgeblog.org">http://bioinformatics.knowledgeblog.org</a>. </p>
</dd>
<dt>
[<a name="datacite">2</a>]
</dt>
<dd>
<p>Datacite. <a href="http://datacite.org/">http://datacite.org/</a>. </p>
</dd>
<dt>
[<a name="health">3</a>]
</dt>
<dd>
<p>Health and Public Health. <a href="http://health.knowledgeblog.org">http://health.knowledgeblog.org</a>. </p>
</dd>
<dt>
[<a name="nupedia">4</a>]
</dt>
<dd>
<p>Nupedia. <a href="http://en.wikipedia.org/wiki/Nupedia">http://en.wikipedia.org/wiki/Nupedia</a>. </p>
</dd>
<dt>
[<a name="ojs">5</a>]
</dt>
<dd>
<p>Open Journal System. <a href="http://pkp.sfu.ca/?q=ojs">http://pkp.sfu.ca/?q=ojs</a>. </p>
</dd>
<dt>
[<a name="h2g2">6</a>]
</dt>
<dd>
<p>The Guide to Life, the Universe and Everything. <a href="http://www.bbc.co.uk/h2g2/">http://www.bbc.co.uk/h2g2/</a>. </p>
</dd>
<dt>
[<a name="wordpress">7</a>]
</dt>
<dd>
<p>WordPress. <a href="http://www.wordpress.org">http://www.wordpress.org</a>. </p>
</dd>
<dt>
[<a name="wikipedia_expert">8</a>]
</dt>
<dd>
<p>Wikipedia:expert retention, 2008. <a href="http://en.wikipedia.org/wiki/Wikipedia:Expert_retention">http://en.wikipedia.org/wiki/Wikipedia:Expert_retention</a>. </p>
</dd>
<dt>
[<a name="anthologize">9</a>]
</dt>
<dd>
<p>Anthologize, 2010. <a href="http://anthologize.org/">http://anthologize.org/</a>. </p>
</dd>
<dt>
[<a name="csl">10</a>]
</dt>
<dd>
<p>Citation style language, 2010. <a href="http://www.citations-styles.org">http://www.citations-styles.org</a>. </p>
</dd>
<dt>
[<a name="problemdois">11</a>]
</dt>
<dd>
<p>The problem with DOIs, 2011. <a href="http://www.russet.org.uk/blog/2011/02/the-problem-with-dois/">http://www.russet.org.uk/blog/2011/02/the-problem-with-dois/</a>. </p>
</dd>
<dt>
[<a name="tavernakblog">12</a>]
</dt>
<dd>
<p>The Taverna Knowledgeblog, 2011. <a href="http://taverna.knowledgeblog.org">http://taverna.knowledgeblog.org</a>. </p>
</dd>
<dt>
[<a name="arewethere">13</a>]
</dt>
<dd>
<p>Sean Bechhofer. Reflections on blogging a book. Ontogenesis, 2011. <a href="http://ontogenesis.knowledgeblog.org/647">http://ontogenesis.knowledgeblog.org/647</a>. </p>
</dd>
<dt>
[<a name="citeproc-js">14</a>]
</dt>
<dd>
<p>Frank Bennett. Citeproc-js. <a href="https://bitbucket.org/fbennett/citeproc-js/wiki/Home">https://bitbucket.org/fbennett/citeproc-js/wiki/Home</a>. </p>
</dd>
<dt>
[<a name="levels">15</a>]
</dt>
<dd>
<p>Simon Cockell, Dan Swan, and Phillip Lord. Knowledgeblog types and peer-review levels. Process, 2010. <a href="http://process.knowledgeblog.org/archives/19">http://process.knowledgeblog.org/archives/19</a>. </p>
</dd>
<dt>
[<a name="Wikipedia_academics">16</a>]
</dt>
<dd>
<p>Zoe Corbyn. Wikipedia wants more contributions from academics, 2011. <a href="http://www.guardian.co.uk/education/2011/mar/29/wikipedia-survey-academ%
ic-contributions">http://www.guardian.co.uk/education/2011/mar/29/wikipedia-survey-academ%
ic-contributions</a>. </p>
</dd>
<dt>
[<a name="wikipediaage">17</a>]
</dt>
<dd>
<p>Casper Grathwohl. Wikipedia comes of age. The Chronile of Higher Education, 2011. <a href="http://chronicle.com/article/article-content/125899/">http://chronicle.com/article/article-content/125899/</a>. </p>
</dd>
<dt>
[<a name="second">18</a>]
</dt>
<dd>
<p>D. Kell. Metabolomics, food security and blogging a book, 2010. <a href="http://blogs.bbsrc.ac.uk/index.php/2010/01/metabolomics-food-security-b%
logging-book/">http://blogs.bbsrc.ac.uk/index.php/2010/01/metabolomics-food-security-b%
logging-book/</a>. </p>
</dd>
<dt>
[<a name="first">19</a>]
</dt>
<dd>
<p>Jim Logan. What is an ontology? | ontogenesis, 2010. <a href="http://ontogoo.blogspot.com/2010/01/what-is-ontology-ontogenesis.html">http://ontogoo.blogspot.com/2010/01/what-is-ontology-ontogenesis.html</a>. </p>
</dd>
<dt>
[<a name="reality_in_biology_2010">20</a>]
</dt>
<dd>
<p>Phillip Lord and Robert Stevens. Adding a little reality to building ontologies for biology. <em>PLoS One</em>, 2010. </p>
</dd>
<dt>
[<a name="reality_in_biology_2010_blog">21</a>]
</dt>
<dd>
<p>Phillip Lord and Robert Stevens. Adding a little reality to building ontologies for biology, 2010. <a href="http://www.russet.org.uk/blog/2010/07/realism-and-science/">http://www.russet.org.uk/blog/2010/07/realism-and-science/</a>. </p>
</dd>
<dt>
[<a name="onto-mani">22</a>]
</dt>
<dd>
<p>Phillip Lord and Robert Stevens. The Ontogenesis Manifesto, 2010. <a href="http://ontogenesis.knowledgeblog.org/manifesto">http://ontogenesis.knowledgeblog.org/manifesto</a>. </p>
</dd>
<dt>
[<a name="firstyear">23</a>]
</dt>
<dd>
<p>Phillip Lord and Robert Stevens. Ontogenesis: One year one. Ontogenesis, 2011. <a href="http://ontogenesis.knowledgeblog.org/1063">http://ontogenesis.knowledgeblog.org/1063</a>. </p>
</dd>
<dt>
[<a name="taverna">24</a>]
</dt>
<dd>
<p>Tom Oinn, Mark Greenwood, Matthew Addis, M. Nedim Alpdemir, Justin Ferris, Kevin Glover, Carole Goble, Antoon Goderis, Duncan Hull, Darren Marvin, Peter Li, Phillip Lord, Matthew R. Pocock, Martin Senger, Robert Stevens, Anil Wipat, and Chris Wroe. Taverna: lessons in creating a workflow environment for the life sciences: Research articles. <em>Concurr. Comput. : Pract. Exper.</em>, 18:1067–1100, August 2006. </p>
</dd>
<dt>
[<a name="epub_from_wordpress">25</a>]
</dt>
<dd>
<p>Peter Sefton. Making epub from wordpress (and other) web collections, 2011. <a href="http://jiscpub.blogs.edina.ac.uk/2011/05/25/making-epub-from-wordpress-%
and-other-web-collections/">http://jiscpub.blogs.edina.ac.uk/2011/05/25/making-epub-from-wordpress-%
and-other-web-collections/</a>. </p>
</dd>
<dt>
[<a name="cito">26</a>]
</dt>
<dd>
<p>David Shotton. CiTO, the Citation Typing Ontology. <em>Journal of Biomedical Semantics</em>, 1(Suppl 1):S6, 2010. </p>
</dd>
<dt>
[<a name="snomed">27</a>]
</dt>
<dd>
<p>M.Q. Stearns, C. Price, K.A. Spackman, and A.Y. Wang. SNOMED clinical terms: overview of the development process and project status. In <em>AMIA Fall Symposium (AMIA-2001)</em>, pages 662–666. Henley &amp; Belfus, 2001. </p>
</dd>
<dt>
[<a name="go2000">28</a>]
</dt>
<dd>
<p>The Gene Ontology Consortium. Gene Ontology: Tool for the Unification of Biology. <em>Nature Genetics</em>, 25:25–29, 2000. </p>
</dd>
</dl>
</div>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1920 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2011/06/ontogenesis-knowledgeblog-lightweight-semantic-publishing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Feedback on Webprints</title>
		<link>http://www.russet.org.uk/blog/2011/05/feedback-on-webprints/</link>
		<comments>http://www.russet.org.uk/blog/2011/05/feedback-on-webprints/#comments</comments>
		<pubDate>Tue, 24 May 2011 15:18:19 +0000</pubDate>
		<dc:creator>Phil Lord</dc:creator>
				<category><![CDATA[Grants]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1918</guid>
		<description><![CDATA[Josh Brown from JISC has given his permission for me to reproduce the feedback from the peer-reivew of my last JISC grant which bounced. A shame, as it would have provided us with an opportunity to test out knowledgeblog on papers from the wild, while also producing an great demonstrator of the advantages of using [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1918">
<p><a name="preamble"></a> 
<p>Josh Brown from JISC has given his permission for me to reproduce the feedback from the peer-reivew of my <a href="http://www.russet.org.uk/blog/2011/04/webprints-%E2%80%93-eprints-within-knowledgeblog/">last</a> JISC grant which bounced. A shame, as it would have provided us with an opportunity to test out knowledgeblog on papers from the wild, while also producing an great demonstrator of the advantages of using the web to distribute papers with web technology rather than just dumping a link to a PDF.</p>
<p>With luck, we can rejuvenate this work in another way.</p>
<blockquote><p>&#8220;One bid (Bid no 8: Newcastle University) was flagged by one of the markers as being out of scope, despite receiving good marks and positive comments from the other two markers.</p>
<p>The original terms of the call specifically state that projects must add value to existing peer reviewed journals. Projects seeking solely to create new publications are specifically excluded. (Please review the sections <em>Expected Outputs</em> and <em>Requirements</em> of the call for more detail on these conditions.)</p>
<p>Bid no 8 states:</p>
<p>&#8220;we will identify authors within Newcastle, take their open-access publications and recast them into a form suitable for WordPress&#8221;</p>
<p>The bid is clearly designed to aggregate content that has been published elsewhere, largely based on content held within Newcastle&#8217;s institutional repository. No existing, peer-reviewed scholarly journal is involved in this project.</p>
<p>While the creation of a web-native publishing tool clearly has merit, as identified by the two markers who praised this bid, the funding call is, as stated, intended to add value to existing publications. In the absence of an existing peer-reviewed publication as a partner in this project, the bid is out of scope&#8221;</p>
<p>The panel agreed with this analysis, which meant that, despite the fact that the project was viewed unanimously as very strong proposal on its own merits, we were obliged to decline to fund this project. The requirement for direct partnership with an existing peer-reviewed scholarly journal for all projects in this strand was imposed after lengthy discussion, and for a range of reasons, including sustainability, tight time-frames and so on, and it was felt that this should be upheld.</p>
<p align="right"> &#8212; Josh Brown </p>
</blockquote>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1918 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2011/05/feedback-on-webprints/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Permalink Semantics</title>
		<link>http://www.russet.org.uk/blog/2011/05/permalink-semantics/</link>
		<comments>http://www.russet.org.uk/blog/2011/05/permalink-semantics/#comments</comments>
		<pubDate>Tue, 17 May 2011 16:35:53 +0000</pubDate>
		<dc:creator>Phil Lord</dc:creator>
				<category><![CDATA[Science]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1908</guid>
		<description><![CDATA[So, to start with a rant. I have reached a key and pivotal point in my life. I have decided that I never, ever, ever want to see permalinks with any semantics in them, ever again. And before any one gets clever, yes, I know that this post has semantics in its permalink. Recently I [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1908">
<p><a name="preamble"></a> 
<p>So, to start with a rant.</p>
<p>I have reached a key and pivotal point in my life. I have decided that I never, ever, ever want to see permalinks with any semantics in them, ever again. And before any one gets clever, yes, I know that this post has semantics in its permalink.</p>
<p>Recently I was looking through <a href="http://knowledgeblog.org">Knowledge Blog</a> and realised that I have made a mistake with the permalink structure. When we created <a href="http://ontogenesis.knowledgeblog.org">Ontogenesis</a> I used semantic links&#8201;&#8212;&#8201;that is permalinks with the title of the article in them, because I thought that they would be more popular with authors and easier to remember. However, I didn&#8217;t want name clashes, land grabs or disambiguation of the sort that you get on Wikipedia(website). So I added in a date as well as a uniquish identifier. I realised quickly that I had manage to combine the worst of both worlds; people wished to change the titles of their articles, and the permalinks no longer fitted. And the links were still hard to remember. So I moved ontogenesis onto the simple number-based permalink structure that it has today. As a concession to usability, I didn&#8217;t use the basic <tt>?=192</tt> that is the default, but instead the rewritten <tt>192</tt> which is easier. As far as I can tell, WordPress remembers old permalinks&#8201;&#8212;&#8201;they do not just go away when the overall structure is changed and links are preserved. They really are as permanent as these things go.</p>
<p>But I had fixed the other knowledgeblogs subdomains consistently. My update to <a href="http://process.knowledgeblog.org">Process</a> which defines and documents the process of knowledgeblog itself was still set up with the older style identifiers. So I changed it; for example, <tt>http://process.knowledgeblog.org/archives/19</tt> became plain <a href="http://process.knowledgeblog.org/19">http://process.knowledgeblog.org/19</a>. I don&#8217;t understand why, as WordPress seemed to maintain the links last time, but apparently this broke an email <a href="http://eridadnus.com">Dan Swan</a> had sent out advertising out Bioinformatics <a href="http://knowledgeblog.org/131">Write-a-thon</a>.</p>
<p>While I have generally purged semantics from links, WordPress still maintains the &#8220;title as link&#8221; approach for pages, as opposed to posts. I guess this makes sense, as you generally don&#8217;t have that many pages, but in this case it has shot me in the foot. I started to re-create a &#8220;Who are we&#8221; page for the <a href="http://www.knowledgeblog.org">www</a> main domain of knowledgeblog. This ended up with a URL of <tt>http://www.knowledgeblog.org/who-are-we</tt>; but then I got distracted and left the job half-done. More I wanted to use my normal editing environment. So I trashed the page. Today, I created another page with the same name. But this got a URL of <tt>http://www.knowledgeblog.org/who-are-we-2</tt>. Ugly. WordPress would not let me rename this permalink, so I tried resurrecting the trashed post and changing it&#8217;s content. For reasons that I don&#8217;t understand, this didn&#8217;t work either and I ended up with <a href="http://www.knowledgeblog.org/who-are-we-3">http://www.knowledgeblog.org/who-are-we-3</a>. I tried changing this to <a href="http://www.knowledgeblog.org/who">http://www.knowledgeblog.org/who</a> which works, but redirects to <a href="http://www.knowledgeblog.org/who-are-we-3">http://www.knowledgeblog.org/who-are-we-3</a>.</p>
<p>So, WordPress is doing (mostly) the right thing, but it still all worked against me. I don&#8217;t understand however, why, WordPress doesn&#8217;t allow you to set default permalinks for Pages as well as posts. It should do, but as far as I can tell, it does not.</p>
<p>The irony of this is that this is not a new issue. I even wrote a post about <a href="http://www.russet.org.uk/blog/2009/09/obo-format-and-manchester-syntax/">Manchester syntax and OBO</a> which largely revolves around this issue. I know about the importance of semantics-free identifiers, and I should have known better then to make a mess of things this way, but on knowledgeblog and indeed on this blog. It just goes to show that handling change is hard and living with a nasty legacy is often the result. I guess that it is a nice example of the advantages and disadvantages of semantics and the compromises that have to be made in any engineering situation.</p>
<p>I haven&#8217;t decided yet, but I think I will change the permalink structure of this blog in a few days time. I am hopefully that existing links will be maintained, but that all future ones will exist only in numeric form. Fingers crossed, it will all work.</p>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1908 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2011/05/permalink-semantics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Introducing Michael Bell&#8217;s Blog</title>
		<link>http://www.russet.org.uk/blog/2011/04/introducing-michael-bells-blog/</link>
		<comments>http://www.russet.org.uk/blog/2011/04/introducing-michael-bells-blog/#comments</comments>
		<pubDate>Wed, 20 Apr 2011 15:51:10 +0000</pubDate>
		<dc:creator>Phil Lord</dc:creator>
				<category><![CDATA[Science]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1900</guid>
		<description><![CDATA[This is just a short introduction to Michael Bell, my PhD student. He&#8217;s now in the second year of his PhD, and has been looking at annotation in biological databases. More specifically, we are trying to define quality measures for textual annotation, based around the bulk properties of these databases. It&#8217;s related to, but distinct [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1900">
<p><a name="preamble"></a> 
<p>This is just a short introduction to <a href="http://www.cs.ncl.ac.uk/people/M.J.Bell1">Michael Bell</a>, my PhD student. He&#8217;s now in the second year of his PhD, and has been looking at annotation in biological databases. More specifically, we are trying to define quality measures for textual annotation, based around the bulk properties of these databases. It&#8217;s related to, but distinct from my early work on semantic similarity. The question is whether we can judge the quality of sentences, words or records based on how they have been used previously, and how far they have spread.</p>
<p>Michael has now started to <a href="http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/">blog</a> his work, following on from my own <a href="http://knowledgeblog.org">knowledgeblog</a> work, and our general commitment to open science. As part of his work, he is starting to build web delivered tools, as it is a useful way of navigating the complex knowledge space of biological data. So, his website is also part of his work.</p>
<p>A good example of this recent blog post discusses the creation of <a href="http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=55">word clouds</a> for all historical versions of Swiss-Prot and TrEMBL and, because everyone loves a word cloud, it is well worth a look.</p>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1900 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2011/04/introducing-michael-bells-blog/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WebPrints – ePrints within knowledgeblog.</title>
		<link>http://www.russet.org.uk/blog/2011/04/webprints-%e2%80%93-eprints-within-knowledgeblog/</link>
		<comments>http://www.russet.org.uk/blog/2011/04/webprints-%e2%80%93-eprints-within-knowledgeblog/#comments</comments>
		<pubDate>Thu, 07 Apr 2011 12:00:54 +0000</pubDate>
		<dc:creator>Phil Lord</dc:creator>
				<category><![CDATA[Grants]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1896</guid>
		<description><![CDATA[This is latest grant that we have submitted to JISC, in this case for a new application of the knowledgeblog platform. As usual, it is a direct post from word, so there may be a few presentational issues in it.   The grant is currently under review; I will post the outcome and any feedback [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1896">
<p>This is latest grant that we have submitted to JISC, in this case for a new application of the knowledgeblog platform. As usual, it is a direct post from word, so there may be a few presentational issues in it.
</p>
<p>
 </p>
<p>The grant is currently under review; I will post the outcome and any feedback (if possible) once I have a result. <span style="font-size:12pt">
		</span></p>
<h1><span style="font-size:11pt">Outline Project Description</span>
</h1>
<p>In this project, we will generate a large body of web content, demonstrating the applicability of commodity blogging technology as supplement to the Universities existing eprints archive. Through a use of technology pioneered by the JISC funded Knowledgeblog project, we will publish 100+ scientific articles, from a variety of different word-processing environments, in a structured-web capable form rather than as PDF. This content will then be augmented to demonstrate  the advantages  of leverage from a commodity platform, enabling novel mechanisms of publication.
</p>
<h1>1. Introduction
</h1>
<p><sup>1</sup>The modern <strong>publishing</strong> industry has been massively affected by the development of the web. However the impact has been highly varied across different domains. Publications that address news events or encyclopedic knowledge have been very heavily affected; other areas have changed little. The web initially developed from the desires of scientists to share knowledge; in some areas, such as biology, the uptake of web technologies has been little short of extraordinary. It is ironic, therefore, that the publishing of<strong> formal academic papers</strong> has been affected relatively little by the web. Although, content page listings may have been largely replaced by RSS or email, and papers may be available as HTML, they are still largely <strong>constrained</strong> by the print requirements, packaged as <strong>PDFs</strong>, poorly linked, with static figures. 
</p>
<p>
 </p>
<p><sup>2</sup>An alternative publication mechanism has already been <strong>funded by JISC</strong> as part of the &#8220;Managing Research Data&#8221; programme. As part of the Knowledgeblog project, we have investigated using a publication tool, which integrates well with scientists&#8217; existing work-practices, based around a commodity blogging engine, namely WordPress. There are a number of tools such as Open Journal Systems, or organizations like Scielo which allow the web publication of academic articles. While these have <strong>large user bases</strong> (OJS &#8212; 6000 journals, Scielo &#8212; 600), currently, WordPress is used to drive around <strong>10% of the world</strong>&#8216;s websites; a user base orders of magnitude larger. WordPress, therefore, performs the basic tasks of publishing articles extremely well, scaling to millions of page hits, enjoys tool support from many word processing environments and benefits from many augmentations for specialist audiences. We have <strong>extended</strong> this tool with a few specialised extensions of our own and, as a result, made it more suitable for academic publishing. We have then used this tool as the basis for two journals, in this case, aimed at producing educational resources describing ontology technology (http://ontogenesis.knowledgeblog.org), and the <strong>JISC-funded</strong> Taverna workflow system (http://taverna.knowledgeblog.org). 
</p>
<p>
 </p>
<p><sup>3</sup>These two resources are, in effect, &#8220;<strong>gold</strong> open-access&#8221; &#8212; although not requiring author payment. They present content which has not been presented elsewhere, but was written for the purpose; articles have been (or are progressing through) a formal review process. While this has provided a useful resource, generating over 15k page views, these resources are designed to be coherent in scope; although this is generally a positive virtue, by definition it allows us to investigate the suitability of the tooling for only a small number of articles and a limited domain. 
</p>
<p>
 </p>
<p><sup>4</sup>Newcastle University has a strong <strong>history</strong> in supporting gold open access publication: it was the site for the first open access law journal in the UK (http://webjcli.ncl.ac.uk/). In addition, it also has a large and successful eprints repository (http://eprints.ncl.ac.uk) archive, currently hosting 50k articles or bibliographic records; in this project, we will exploit the eprints archive to provide content, building a substantial knowledge resource; this will both demonstrate the suitability of the Knowledgeblog tool-chain as the basis for green open access publication, the value of this novel form of publication, and provide the <strong>vital</strong> testing against content &#8220;from the wild&#8221;, allowing us to extend the <strong>suitability</strong> of this tool-chain to as many areas of academic discourse as possible.
</p>
<h1> 2. Fit to call
</h1>
<p><sup>5</sup>The project call notes that JISC is or has funded many projects relating to scholarly communication. These include: infrastructural support in the form of institutional repositories; support for open-access; and support for novel mechanisms of publication such as overlay journals. Specifically, theme D –  campus-based publishing – is aimed at increasing the <strong>capacity</strong> of the sector to publish and disseminate research outputs directly. The call also highlights attempts such as the &#8220;Beyond the PDF&#8221; workshop to move toward more structured forms of knowledge; while, in theory, PDF is capable of supporting relatively rich structuring, in practice, most of the tools which generate files in this format produce a relatively opaque, binary artefact from which it is difficult to extract information, or to repurpose or recast that in any way. 
</p>
<p>
 </p>
<p><sup>6</sup>While open-access publishing has made significant strides in the last 10 years, becoming an accepted part of the academic landscape, Gold open-access – the publication of original content – still accounts for the minority of academic publications. Green open-access – author publication of content often published elsewhere – now accounts for up-to 25% of the literature in some fields.  
</p>
<p>
 </p>
<p><sup>7</sup>Institutional repositories such as that run by Newcastle (http://eprints.ncl.ac.uk) or author archiving on their website (e.g. http://homepages.cs.ncl.ac.uk/phillip.lord/publications.html) are the most common route for green open-access publication. While increasing access to academic materials is a very positive step, this form of publication is largely limited to providing access to a PDF. From neither the authors, nor the readers point of view, is there significant added value to the publication. For example, our experience is that authors are often equivocal or disinterested in publication in institutional repositories as it is &#8220;just-one-more-thing&#8221; to do, while maintaining a website requires significant technical expertise. 
</p>
<p>
 </p>
<p><sup>8</sup>For this grant, <strong>academics</strong> at Newcastle supported by the <strong>infrastructure</strong> provided by the local <strong>librarians</strong> will provide an alternative; we will identify authors within Newcastle, take their open-access publications and recast them into a form suitable for WordPress. We will do this with their active permission and engagement, using the tooling we have developed or documented as as part of the previously-funded JISC &#8220;knowledgeblog&#8221; project. Where authors wish to, we will <strong>support</strong> them in performing this work for themselves; where they do not want &#8220;just-one-more-thing&#8221;, we will leverage off the existing eprints process, and perform this work for them. In general, this can be performed directly using MS Word, latex or other word-processing software, whichever is the authors&#8217; preferred editing environment. In addition, we will use this process to increase the usability of the tooling, increasing the ability to and likelihood that authors will directly publish their work in fashion. As this proposal is built on existing work from the University <strong>eprints</strong> archive, <strong>library-support</strong> is implicit within FEC and not specifically or additionally costed. 
</p>
<p>
 </p>
<p><sup>9</sup>Once publications are available in this framework, authors and readers will be able to take advantage of the additional features which come either from WordPress directly, or from augmentations provided or assessed by the WebPrints team. For example, authors will be able to see rich content-access statistics, including page-views, referrer and incoming link information. Published articles will be bi-directional linkable using trackbacks. Authors will be able to add tags, zoomable equations or automatically generated reference lists depending on their level of technical competence. For viewers, category and tag based RSS feeds will be available, searching, bi-directional linking  (again!) will be possible. As a result of the work from the previous knowledgeblog grant, all posts will be <strong>tagged</strong> with metadata, in various forms, and will be available for formal archiving outside of the University. 
</p>
<p>
 </p>
<p><sup>10</sup>The publication framework is based around WordPress which is freely available, scalable, stable and hardened by its multiple user base. The system is continually updated, but has a good reputation for maintaining backward compatibility. The authoring framework is based around commodity tools such as Word or latex. Most of the workflow process within Newcastle is pre-existing as part of the eprints service. This project therefore provides a sustainable and novel enhancement to the existing process. 
</p>
<h1>3. Workplan
</h1>
<h2>3.1 WP1 Management, Systems Administration and Set up. 
</h2>
<p><sup>11</sup>This work package will fulfil the basic management and administrative tasks required for the project. This will include setup of the repository, styling and theming appropriately for the project; definition of a basic workflow for management of documents and metadata; fulfilment of standard JISC reporting requirements. 
</p>
<p><sup>12</sup>We request additional funding of 1k as part of this work-package for virtual server upgrades (additional disk space), dropbox space to enable document management, and wordpress anti-comment spam support. 
</p>
<h2>3.2 WP2 User documentation. 
</h2>
<p><sup>13</sup>Most of the operational, &#8220;how-to&#8221; documentation is already available: either at http://process.knowledgeblog.org (developed by the JISC funded knowledgeblog project); or, as the repository is based on commodity technology, from many publicly available websites. 
</p>
<p>
 </p>
<p><sup>14</sup>However, there will be information specific to the Webprints archive; about copyright, about document management, and about the relationship to the university. For this, we will need to generate some specific documentation.
</p>
<p>
 </p>
<p><sup>15</sup>As the project progresses, we will improve and enhance this documentation, based on our experiences, including for example, statistics on how long author self-deposition takes. 
</p>
<h2>3.3 WP3 Author advertising and Material identification
</h2>
<p><sup>16</sup>We will seek active <strong>engagement</strong> with our user community, by linking into the current eprints system. Combined with the Newcastle-specific, internal &#8220;myimpact&#8221; database (which was designed to capture research outputs for the next REF), this will enable us to identify new publications as they come out. In the first instance, we will select material that has been published in <strong>open access</strong> journals (or where embargo periods, or other conditions allow). We will contact authors individually, inform them of our project, and advising them about the methods for recasting of their paper (see WP4).
</p>
<p>
 </p>
<p><sup>17</sup>We will not preselect on the basis of academic quality, only technical and legal (copyright) grounds. Although the eprints service displays full text as PDF only, the myimpact database in many cases also stores MS Word (or equivalent) formatted data. We will, therefore, prefer papers where this data is available. We will prefer papers which are recent over those which are older. Finally, we will prefer papers which give us a wide spread of authorship and discipline. 
</p>
<p>
 </p>
<p><sup>18</sup>Although the focus of this proposal is on the provision of a service for publication of green open access material in a fully web-capable format, we will be happy to receive grey literature, on an author-publication basis.
</p>
<h2>3.4 WP4 Paper recasting 
</h2>
<p><sup>19</sup>This work package will take papers selected as part of WP3 and publish them to the webprints archive. In most cases, this work will be performed using tooling developed or documented by the previously funded JISC knowledgeblog project. 
</p>
<p>
 </p>
<p><sup>20</sup>We will publish articles in three ways:
</p>
<p>
		<em>Webprints team published</em>. All work will be performed by members of the Webprints team. For each paper, we will write a short report, describing any issues with the publication process, and any errors seen (which we will hand-correct). We will gather statistics on the time taken to publish. Papers will be published on an &#8220;as-is&#8221; basis; that is we will not seek to enhance the content at this point. We will add <strong>metadata</strong> in a structured way, which will be <strong>accessible</strong> from the web presented version.
</p>
<p>
		<em>Author published, webprints supported</em>. We will work directly with authors to publish papers and help them. Where possible, we will <strong>augment</strong> and add new features (latex maths support, citation). These papers will be marked as featured, and augmented. Again, we will gather <strong>statistics</strong> on the time taken to publish, broken down for additional functionality. 
</p>
<p><em> Author published</em><strong>.</strong> Authors will publish directly into Webprints, using either their pre-existing experience, or our own user documentation. We will request, but not require statistical feedback. Publication will be as the <strong>author wishes</strong> &#8212; as-is, or augmented with additional functionality. 
</p>
<p>
 </p>
<p><sup>21</sup>All papers will be annotated with standard metadata in a structured form; our previous work means that this metadata will be available from the web presentation of the paper. 
</p>
<h2>3.5 WP5 Repository and process enhancement
</h2>
<p><sup>22</sup>For this package, we will focus on two key aspects: tooling for publishing papers and their presentation once there. 
</p>
<p>
 </p>
<p><sup>23</sup>For the presentational issues, in the first instance we will focus on enhancements which do not require support from the article material. <strong>For</strong> example, as we will add metadata to articles, which will allow us to generate metadata headers (CoINS, standard meta tags etc) without further analysis of the article material itself. Likewise, our experience with the knowledgeblog project means that we can support &#8220;out-of-the-box&#8221;: multiple <strong>export</strong> formats (including HTML, PDF and ePUB); site wide <strong>indexes</strong> (by year, author, subject etc); comments; trackbacks and page feeds (including from subsections). Through use of third-party software, we will also be able to add: <strong>related papers</strong> through textual analysis; tag clouds; twitter backs; automated <strong>multi-lingual</strong> presentation and <strong>social networking</strong> support. 
</p>
<p>
 </p>
<p><sup>24</sup>We will also investigate enhancements which require modification of the original content (and therefore increased interaction with authors). From the knowledgeblog project these will include: <strong>scalable equation</strong> presentation; and client-side generated <strong>bibliographies</strong>. We will also add &#8220;custom posts&#8221; for supplementary material (spreadsheets for instance). And, finally, through the use of third-party material, enhancements such as <strong>syntax highlighting</strong>, zoomable maps, <strong>slideshows</strong> and so forth. This part of the proposal is designed to be open-ended and exploratory; which forms of enhancements, we pursue will depend on the types papers selected and interactions with the authors. There are currently over 13,000 plugins available for wordpress, which provides us with a considerable resource to build from. 
</p>
<h2>3.6. Timetable
</h2>
<div>
<table style="border-collapse:collapse" border="0">
<colgroup>
<col style="width:259px"/>
<col style="width:65px"/>
<col style="width:57px"/>
<col style="width:81px"/></colgroup>
<tbody valign="top">
<tr>
<td style="border-top:  solid 0.5pt; border-left:  solid 0.5pt; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="color:#17365d"><em>Name</em></span></p>
</td>
<td style="border-top:  solid 0.5pt; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<h5>Begin date</h5>
</p>
</td>
<td style="border-top:  solid 0.5pt; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<h5>End date</h5>
</p>
</td>
<td style="border-top:  solid 0.5pt; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<h5>Resources</h5>
</p>
</td>
</tr>
<tr>
<td style="border-top:  none; border-left:  solid 0.5pt; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>WP1.1 &#8211; Setup Repository</p>
</td>
<td style="border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>02/05/11</p>
</td>
<td style="border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>14/05/11</p>
</td>
<td style="border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>SC, AL, DS</p>
</td>
</tr>
<tr>
<td style="border-top:  none; border-left:  solid 0.5pt; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>WP1.2 &#8211; Document Workflow</p>
</td>
<td style="border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>02/05/11</p>
</td>
<td style="border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>14/05/11</p>
</td>
<td style="border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>PL</p>
</td>
</tr>
<tr>
<td style="border-top:  none; border-left:  solid 0.5pt; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>WP2.1 &#8211; User Documentation</p>
</td>
<td style="border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>09/05/11</p>
</td>
<td style="border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>24/05/11</p>
</td>
<td style="border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>DS, PL</p>
</td>
</tr>
<tr>
<td style="border-top:  none; border-left:  solid 0.5pt; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>WP2.2 &#8211; User Statistics</p>
</td>
<td style="border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>16/05/11</p>
</td>
<td style="border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>31/08/11</p>
</td>
<td style="border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>SC, AL</p>
</td>
</tr>
<tr>
<td style="border-top:  none; border-left:  solid 0.5pt; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>WP3.1 &#8211; Author Engagement</p>
</td>
<td style="border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>16/05/11</p>
</td>
<td style="border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>31/08/11</p>
</td>
<td style="border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>SC, AL, DS, PL</p>
</td>
</tr>
<tr>
<td style="border-top:  none; border-left:  solid 0.5pt; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>WP4.1 &#8211; Paper Recasting</p>
</td>
<td style="border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>01/06/11</p>
</td>
<td style="border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>30/09/11</p>
</td>
<td style="border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>SC, AL, DS, PL</p>
</td>
</tr>
<tr>
<td style="border-top:  none; border-left:  solid 0.5pt; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>WP5.1 &#8211; Repository Enhancement</p>
</td>
<td style="border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>01/07/11</p>
</td>
<td style="border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>30/09/11</p>
</td>
<td style="border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p>SC, AL, DS, PL</p>
</td>
</tr>
</tbody>
</table>
</div>
<h1>4. Deliverables
</h1>
<p><sup>25</sup>A repository of open-access articles in a fully web-capable format. This will act as a supplement to the existing eprints archive at Newcastle. We expect to generate around 100 articles in this form, although this is likely to be an underestimate. We are currently estimating throughput from our experiences with Knowledgeblog, which involved relatively few articles. The process should benefit from high-throughput experience. Further documentation, published on http://process.knowledgeblog.org, describing the process that we have used to set up this repository.  Enhancements to tooling, enabling others to publish more easily in this manner. Additional experience and software enhancing the presentation of data held in this form. 
</p>
<h1>5. Project management arrangements
</h1>
<p><sup>26</sup>The project will be managed by Dr Lord, who will be responsible for:
</p>
<ul>
<li>Developing Project Management Plans; 
</li>
<li>Ensuring that the Project work package objectives are met; 
</li>
<li>Prioritising and reconciling conflicting opportunities;
</li>
<li>Reporting and collaborating with JISC programme manager
</li>
<li>Dissemination of research results. 
</li>
</ul>
<p>
 </p>
<p><sup>27</sup>Project progress will be evaluated through scheduled, short, &#8220;stand-up&#8221; meetings on a weekly basis, conducted face-to-face, via Skype or phone as appropriate. Primary unscheduled communication will be via public mailing list, ensuring maximum visibility and openness. We will use other readily available tooling to manage the document process pipeline – Google spreadsheets, dropbox, and likewise for software development (Google code). All staff are associated with other projects or service provision (research, teaching, training); they will be individually responsible for managing these workloads, and are highly experienced at doing so.  
</p>
<h2>5.1 Risk Management
</h2>
<p><sup>28</sup>Staff risks – the basic organisation of the project has been designed to mitigate against staffing issues. All staff are in post and are <strong>highly experienced</strong>, with long-track records at Newcastle. Costs have been split three ways, therefore even if in the unlikely event that one member of the team leaves during the project, it will not cause significant distruption. 
</p>
<p><sup>29</sup>Software risks – we are using <strong>commodity technology</strong>, which is very well <strong>proven</strong> and supported. None of the software is critical (even our basic blogging engine, wordpress, is replaceable). Therefore, while changes in third-party software might degrade or slow progress, it will not halt. 
</p>
<p><sup>30</sup>Engagement Risks – the project requires a level of engagement from Newcastle researchers, which may not materialize. We have minimized this risk by minimizing the effort the engagement takes on behalf of the researchers. The project members are well known to many in the university (DS and SC comprise the &#8220;Bioinformatics Support Unit&#8221; and have worked for many PIs personally). We have active engagement from the library, in particular from Moira Bent (Science Faculty Liaison Librarian), and Paula Fitzpatrick (Digital Libraries).
</p>
<h2>5.2 IPR position
</h2>
<p><sup>31</sup>The bulk of the content handled by this work will come from authors within the University. The current restrictive copyright requirements of many publishers place uncertain limits on what can or cannot be done with this content. For this reason, we will use articles that have been published with or have become available under creative commons or other open access license. 
</p>
<p>
 </p>
<p><sup>32</sup>Project members will release written work (documentation etc) under a Creative Commons Attribution ShareAlike 3.0 Unported License <strong>(CC BY-SA</strong>), which allows re-use and modification for non-commercial purposes with attribution. This is in line with the JISC Model Licence. Software linked to WordPress will be released under GPL, as required by the WordPress license. Software which is separable will be released under LGPL. Software linked to other third-party libraries may use other license if required; this will be limited to Free/Open source licences. 
</p>
<p>
 </p>
<h2>5.3 Sustainability
</h2>
<p><sup>33</sup>This project is largely based around innovative, novel and leading use of <strong>existing</strong> software. As such the sustainability of the majority of the technology base is not dependent on project members but large companies with established and proven business models.  
</p>
<p>
 </p>
<p><sup>34</sup>The WebPrints archive will be run from the same server as knowledgeblog.org; this is being developed and maintained and will be for the foreseeable future, and the additional of the WebPrints archive will not be a substantial additional cost. However, should this cease to happen, the content of the WebPrints archive will be creative commons or an equivalent permissive license. This will make it possible for the JISC funded UK Web Archive to store the website for the future.
</p>
<p>
 </p>
<p><sup>35</sup>Although, we will not be able to sustain publication by the WebPrints team past the lifetime of this proposal without further funding, <strong>author publication </strong>will be possible; our experience with existing tooling is that this is possible for many, although requires some level of technical skill, depending on the word-processor package, and level of complexity of the paper. 
</p>
<h2>5.4 Staff Recruitment
</h2>
<p><sup>36</sup>All staff are already <strong>in post</strong>. Recruitment during the project will therefore be unnecessary. 
</p>
<h2>5.5 Key Beneficiairies
</h2>
<p><sup>37</sup>Our immediate beneficiaries Newcastle University staff, who will have their work published using a new and novel publication technique. Critically, we will demonstrate the value of this form of publication technique to both <strong>researchers</strong> and <strong>librarians</strong> within the University who will in future be better placed to use or support this technology to publish their own or others work in future. 
</p>
<p>
 </p>
<p><sup>38</sup>Although presented here as a discrete project, the work fits within the background of the wider blogging community. So, our own knowledgeblog project and website will be able to take advantage of software improvements that will happen as a result of this work. Additionally, the general academic blogging community will gain a new resource. Increasingly, this community is a critical path for <strong>public engagement</strong> in the academic process. 
</p>
<h2>5.6 Community Engagement
</h2>
<p><sup>39</sup>Community engagement will take place initially by direct contact; we will email authors to ask for their engagement in the publishing process. This should have the secondary effect of <strong>advertising</strong> the presence of our project. We have active <strong>engagement</strong> from the <strong>library</strong> staff, who are well known within the University. In terms of engagement with the resource outside of Newcastle, we will make active use of various web and social networking facilities. Our experience has shown that this can generate <strong>significant</strong> amounts of engagement in a relatively <strong>short</strong> period of time. Finally, we will advertise the work through standard academic channels of conference and journal publication; although effective, this tends to be slow. This is problematic for a short project, hence we consider this to be a secondary means of communication. 
</p>
<p>
 </p>
<h1>6. Budget 
</h1>
<p>
 </p>
<p>Removed for privacy reasons. 
</p>
<h1>7. Project Team
</h1>
<p>
 </p>
<p><sup>40</sup>Dr. Phillip Lord is a Lecturer of Computing Science at Newcastle University. He has a PhD in yeast genetics from University of Edinburgh, after which he moved into bioinformatics. He is well known for his work on ontologies in biology, as well as his contributions to eScience beginning with his role as a RA on the myGrid project. Since his move to Newcastle, he has been an investigator on there more eScience projects; CARMEN, ONDEX and InstantSOAP, as well as maintaining an active engagement in standards development (OBI, MIGS, MIBBI), and publishing on the fundamentals of ontology design. He is a active participant in the Scientific Blogging community, developed the initial idea for knowledgeblogs. As well as managing the knowledgeblog project, he is the developer of tools such as &#8220;Latextowordpress&#8221;, as well as WordPress plugins such as &#8220;Mathjax-latex&#8221; and &#8220;Kcite&#8221; all of which improve the usefulness of wordpress for academic communication.
</p>
<p>
 </p>
<p><sup>41</sup>Dr. Daniel Swan has a PhD in developmental biology and continued to work in developmental biology as a post-doctoral researcher before moving into bioinformatics in 2001. Subsequent positions included working for Bart&#8217;s and the London Genome Centre and the Centre for Hydrology and Ecology in informatics driven roles dealing with large, distributed biological datasets generated by large user communities. Currently the manager of the Newcastle University Bioinformatics Support Unit, he leads a small team aiding biological researchers generate, capture, store and analyse their digital data. His interdisciplinary background means he has grounding in both computer and biological sciences and is comfortable working on CS focused projects (CARMEN, InstantSOAP, Bio- Linux) as well as acting in a research capacity analysing high-throughput data. He is currently active within the knowledgeblog project, having been responsible for adding software support for a review process, gravatars, syntax highlighting, PDF and ePUB exports. 
</p>
<p>
 </p>
<p><sup>42</sup>Dr. Simon Cockell has a PhD in Genetics from Leicester University, and refocussed into Bioinformatics with a Masters degree from Leeds in 2005. From there he moved to Newcastle, and the Bioinformatics Support Unit. Since coming to Newcastle, Simon has worked on a range of projects involving large scale analyses (AptaMEMS-ID), data integration (Ondex) and health informatics (MRC Mitochondrial Disease Cohort). He is currently active within the knowledgeblog project, having been responsible for metadata support (including Coins), navigational support (for both humans and robots) and is a co-author of kcite and mathjax-latex.
</p>
<p>
 </p>
<p><sup>43</sup>Allyson Lister worked for 6 years at the EBI in Cambridge, developing and producing the UniProt/TrEMBL protein database. She is currently focusing on the use of ontologies for the semantic integration of systems biology data with her current job at CISBAN in Newcastle University. Both at the EBI and at Newcastle University, she developed structured data formats including UniProt/TrEMBL and SBML. She has also been an early adopter of blog technology as a mechanism for communication of both her own and others primary research. Since 2006, she has co-authored a number of posts with other bloggers in the community and has been invited to be a guest author at both the ISCB news and the BioSharing blog. She has published papers highlighting the importance of social networking and live blogging to bioinformatics. </p>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1896 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2011/04/webprints-%e2%80%93-eprints-within-knowledgeblog/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Feedback on a failed JISC grant</title>
		<link>http://www.russet.org.uk/blog/2011/04/feedback-on-a-failed-jisc-grant/</link>
		<comments>http://www.russet.org.uk/blog/2011/04/feedback-on-a-failed-jisc-grant/#comments</comments>
		<pubDate>Wed, 06 Apr 2011 11:49:03 +0000</pubDate>
		<dc:creator>Phil Lord</dc:creator>
				<category><![CDATA[Grants]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1894</guid>
		<description><![CDATA[Paola Marchionni of JISC has give her permission to reproduce the feedback from the peer-review of my last JISC grant which sadly failed. I want to publish it here, as part of my desire for open science rather that as an opportunity to reply which, perhaps unfortunately, the JISC process does not otherwise allow. I [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1894">
<p><a name="preamble"></a> 
<p>Paola Marchionni of JISC has give her permission to reproduce the feedback from the peer-review of my <a href="http://www.russet.org.uk/blog/2011/03/1882/">last</a> JISC grant which sadly failed. I want to publish it here, as part of my desire for open science rather that as an opportunity to reply which, perhaps unfortunately, the JISC process does not otherwise allow.</p>
<p>I am a little surprised by some of the comments, to be honest. The main criticism was more expected though, which essentially says &#8220;it&#8217;s not crowd-sourcing if you pay people to develop content&#8221;. You have to try these things, but I did think that actually paying for content might be considered to be a little revolutionary. Ah, well, better luck next time.</p>
<blockquote><p>Markers felt the form of this proposal was &#8220;robust&#8221;, however there wasn&#8217;t enough clarity on the deliverables and especially on how the value of what was being produced would be assessed down stream. They felt there was also some lack of information on how the currently JISC funded K-Blog project, due for completion in July 2011, related to this project and what the impact on its team would be, which seems to be the same team as the one proposed for this project.</p>
<p>The main concerns, however, were around whether this could really qualify as a crowdsourcing or community project &#8211; it was felt it was more about disclosing data than community engagement &#8211; also considering that the authors of the articles would be paid. There were some doubts about the sustainability of the project beyond the 7 months duration of the funding, as lack of funding would prevent more articles being created and metadata added by the team. One marker also felt that a risk analysis should have taken into account the risk of disparate communities not being aware of the content and using and engaging with it. A more clear identification of the various communities the project aimed to reach and a more targeted strategy for engaging with such communities would have been useful.</p>
<p>Finally, another issue that was raised was that there wasn&#8217;t sufficient information on how the partnership with Manchester University would work, either formally or informally, and the dissemination plans could have been stronger, as they relied mainly on the role of K-Blog.</p>
<p align="right"> &#8212; Paola Marchionni </p>
</blockquote>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1894 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2011/04/feedback-on-a-failed-jisc-grant/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Knowledge in Biology</title>
		<link>http://www.russet.org.uk/blog/2011/03/1882/</link>
		<comments>http://www.russet.org.uk/blog/2011/03/1882/#comments</comments>
		<pubDate>Tue, 15 Mar 2011 15:32:43 +0000</pubDate>
		<dc:creator>Phil Lord</dc:creator>
				<category><![CDATA[Grants]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/2011/03/1882/</guid>
		<description><![CDATA[About This is the full text of a grant called &#8220;Knowledge in Biology&#8221; that we submitted to JISC, as a follow-up to our knowledgeblog grant. Unfortunately, this grant was not accepted. This blog post is the direct output result of Word; apologies if the conversion is imperfect.       Outline Project Description: Many disciplines [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1882">
<p><span style="color:black; font-family:Arial"><strong>About
</strong></span></p>
<p><span style="font-family:Arial"><span style="color:black">This is the full text of a grant called &#8220;Knowledge in Biology&#8221; that we submitted to JISC, as a follow-up to our <a href="http://www.russet.org.uk/blog/2010/08/a-new-grant-for-knowledgeblog">knowledgeblog</a> grant. Unfortunately, this grant was not accepted. This blog post is the direct output result of Word; apologies if the conversion is imperfect. 
</span></span></p>
<p>
 </p>
<p>
 </p>
<p>
 </p>
<p><span style="color:black; font-family:Arial"><strong>Outline Project Description:
</strong></span></p>
<p><span style="font-family:Arial">Many disciplines within the sciences are knowledge-rich; of these, biology is an extreme example. In order to make advances, biologists need to be able to access knowledge from both their own and related communities in an easily digestible form. However, the publishing of this knowledge does not fit well with existing scientific communities, as it is often not regarded as &#8220;research based&#8221; &#8211; rather it is a stored body of grey literature, often not publically available. In the Knowledge in Biology project, we will engage with disparate communities in disciplines that engage with biologists as well as the community of biologists themselves. We will generate substantial content describing how &#8220;Knowledge in Biology&#8221; is both produced and consumed in the pursuit of new discoveries, by commissioning the authorship of this content directly from the funding for this project.
</span></p>
<h3><span style="font-family:Arial">We will leverage the output of the JISC-funded Knowledge Blog platform, as a tool for coordination, publication and dissemination of this content. The result will be a publically accessible, high-impact resource of short, readable and accessible articles describing how to gather, manipulate and synthesise knowledge in biology. This will be of significant value in supporting the multidisciplinary research that is necessary for advance in modern biomedicine.</span>
	</h3>
<p>
 </p>
<h3>1. Introduction
</h3>
<p><span style="font-family:Arial; font-size:10pt"><sup><a name="OLE_LINK5"></a>1</sup>This document describes a proposal for a project within the JISC &#8220;e-Content&#8221; programme call. 
</span></p>
<p><span style="font-family:Arial; font-size:10pt"><sup>2</sup>Modern biology is a rich, complex, multi-disciplinary field. In particular, practitioners need knowledge about how to access, organise and structure knowledge itself. As a result, members of the <strong>community</strong> often need to cross the <strong>boundaries</strong> of traditional societal structures within research. By definition, this is not well supported by the more formal structures that scientists use for the publication and <strong>dissemination</strong> of knowledge. So while the information exists, it is not accessible; <strong>hidden</strong> from the community on the desks and hard-drives of individuals. 
</span></p>
<p><span style="font-family:Arial; font-size:10pt"><sup>3</sup>One of the difficulties with migrating this community-based knowledge away from <strong>grey</strong> literature to a more openly-accessible archived and referenceable form is the lack of a formal reward structure. Although scientists may engage in this form of activity from a sense of public duty, this form of documentation is not critical for their career advancement, or for gaining academic creditability, and so it is rarely made a priority. While technological advances have made publication of this material straightforward, the social structure of science has not supported it. As a result, there is a large body of knowledge about how biologists conduct their work that is simply <strong>lost</strong> to the <strong>community</strong>, meaning considerable lost time and effort recreating this knowledge, only for it to be lost again.
</span></p>
<p><span style="font-family:Arial; font-size:10pt"><sup>4</sup>We plan to circumvent this societal barrier using a novel approach &#8211; we will directly <strong>commission</strong> the authoring and reviewing of articles embodying this content. As the knowledge will often be readily available to individual members of the community, and we are aiming for articles which are neither of the size nor complexity of formal research publications, it will be possible to generate a <strong>substantial</strong> body of content, at relatively low-cost.
</span></p>
<p><span style="font-family:Arial; font-size:10pt"><sup>5</sup>An ideal mechanism for publication of this knowledge has already been <strong>funded by JISC</strong> as a part of the &#8220;Managing Research Data&#8221; programme. This is the Knowledge Blog project: a light-weight publication tool, that integrates well with scientists&#8217; existing work-practices, based around a commodity blogging engine. This &#8216;Knowledge in Biology&#8217; project (KiB) will utilize the work from Knowledge Blog, to the benefit of both: this project will gain a technological underpinning <strong>at little cost</strong> &#8211; Knowledge Blog already exists and will require a small increase in resources to manage the additional content and traffic; Knowledge Blog will gain substantial content and enormously increased visibility.
</span></p>
<p><span style="font-family:Arial; font-size:10pt"><sup>6</sup>The KiB project will provide a small amount of funding for the management and commissioning of articles, but the majority of the funds will be spent by using individually small amounts of money, <strong>crowd-sourcing</strong> the development of a novel digital content resource, engaging the <strong>community</strong> of biomedical researchers, both as authors and reviewers. The <strong>content</strong> will address key issues relating to knowledge in biology such as, data standards, linked data, knowledge in synthetic biology and statistical approaches to knowledge, as well as &#8220;softer&#8221; issues such as the use of Web 2.0, the social web, and the blogosphere as tools for the biomedical researcher.
</span></p>
<h3>2.1 WP1 &#8211; Knowledge Blog (k-blog) maintenance and support
</h3>
<p>
 </p>
<p><span style="font-family:Arial; font-size:10pt"><sup>7</sup>The primary purpose of this proposal is to generate significant quantities of digital, community-developed content. The k-blog platform already exists, supported by a previous JISC call. We are not, therefore, proposing to make significant enhancements to either the process or the software in the course of this project. However, the additional load placed on the platform will require a small amount of administrative work in terms of maintenance.
</span></p>
<p><span style="font-family:Arial; font-size:10pt"><sup>8</sup>In addition, we will need to provide support to the users of the platform; while k-blog is relatively easy-to-use, issues do arise with authoring, with formatting or with exceptional requests (for example, multi-media documents). 
</span></p>
<p><span style="font-family:Arial; font-size:10pt"><sup>9</sup>For articles to be properly <strong>citable</strong> and maintainable, manual intervention is required to supplement the text with computationally accessible metadata, including <strong>DOI</strong> assignment. This enables improved archiving and discovery, which increases the value of the resource. As part of WP1, we will annotate documents with this <strong>metadata</strong> to ensure consistency and to avoid placing the burden on the main authors.
</span></p>
<p><span style="font-family:Arial; font-size:10pt"><sup>10</sup>We will install and refine a licensing plugin for the k-blog platform, which clearly displays license information for each article, based on the author&#8217;s selection. 
</span></p>
<p>
 </p>
<h3>2.2 WP2 &#8211; Management of publication process 
</h3>
<p>
 </p>
<p><span style="font-family:Arial; font-size:10pt"><sup>11</sup>Articles in KiB will be produced by crowd-sourcing and by the in-house team (WP4). Our aim is to bootstrap the KiB k-blog so that it reaches a critical mass of articles that will attract both readers and more authors. We will commission articles from specified, expert authors with the attractor of a small payment. The payment will require the contributor to both submit an article and a review for another article.
</span></p>
<p><span style="font-family:Arial; font-size:10pt"><sup>12</sup>In preparation for this work, we have compiled a list of topics for KiB and put names against these topics. We have clustered the topics around themes in KiB: The role of semantics In biology; the representation of knowledge in ontologies, terminologies and vocabularies; data integration to create knowledge resources; data and knowledge standards; knowledge technologies such as   RDF, Linked data, OWL, etc.; text mining; case studies and applications of knowledge in biology. These clusters, and more, will become the categories in the KiB k-blog. The letters of support indicate the significant number of authors that have promised to author an article on one of these topics. We will seek as wide a selection of authors as possible, guided by our <strong>advisory</strong>
			<strong>committee </strong>(see Section 2.8), to help give the KiB k-blog a balanced view on knowledge in biology. A significant part of this WP will be the commissioning of these articles and discussions with authors on this new digital content sourced from the community.
</span></p>
<p><span style="font-family:Arial; font-size:10pt"><sup>13</sup>This process will need managing: requests for particular articles (WP2.1); negotiation on topic and scope (WP2.2); managing of the author-guided review process (WP2.3); and, enabling payments to be made. This activity will help ensure that the core of the KiB k-blog will be of sufficient quality to attract readers to comment and contribute articles, as well as to simply read and learn.
</span></p>
<h3>2.3 WP3 &#8211; Outreach and Community Engagement
</h3>
<p>
 </p>
<p><span style="font-family:Arial; font-size:10pt"><strong><sup>14</sup>Outreach</strong> and <strong>community engagement</strong> are intrinsic to this project. The presence of a high-quality, organised resource, freely available on the web will attract readers; likewise, a widely-read resource will be attractive as a publication centre for authors, particularly when supported by funding as part of WP2. The use of a rapid publication framework, available on the web, <strong>archived</strong> by the British Library and indexed for searching by Google, therefore, is our main form of outreach. 
</span></p>
<p><span style="font-family:Arial; font-size:10pt"><sup>15</sup>However, this process can be augmented. All content will be available and reusable under a Creative Commons license, making it reusable with citation outside of the KiB environment. We will maintain active &#8220;Social Web&#8221; streams through Twitter. We will solicit articles relating to the use of Twitter and the blogosphere from members of the scientific blogging community; as well as generating content, this will leverage their existing readership, raising awareness of KiB, both as a resource for readers and authors. We will maintain a well-advertised mailing list allowing requests for, or offers of, new articles either commissioned or otherwise. 
</span></p>
<p><span style="font-family:Arial; font-size:10pt"><sup>16</sup>Finally, we will advertise the resource through normal academic channels of paper and poster presentation. Where possible, we will also propose micro-workshops (aka Birds of Feather meetings) at suitable meetings/unconferences.
</span></p>
<h3>2.4 WP4 &#8211; &#8216;In house&#8217; article authoring 
</h3>
<p>
 </p>
<p><span style="font-family:Arial; font-size:10pt"><sup>17</sup>The staff on the project will contribute a significant number of articles to the KiB k-blog. Stevens will produce 20 articles; Lord 10 articles and Swan 10 articles (WP4.1). Both Lord and Stevens have already contributed articles to the Ontogenesis k-blog and will further extend on Ontogenesis in the wider KiB topics. These topics will include articles on tips for modeling in OWL; using ontologies with linked data; converting data to RDF and linked data; On-line knowledge resources; using ontologies in over-representation analysis of microarray data; integration strategies; and so on. Some of these in-house articles will act as glue that draw together many of the other articles. For example, an article on the role of knowledge in biology will draw together the need for the k-blog and act as a <strong>pathfinder</strong>. Where appropriate, we will use tools such as &#8220;Anthologize&#8221; and &#8220;Web Trails&#8221; to facilitate these aggregation activities.  In house articles will be reviewed (WP4.2) by an external reviewer, potentially from the pool of contributors sourced in WP2.
</span></p>
<h3>2.5 WP5 &#8211; Project Management and JISC Requirements
</h3>
<p>
 </p>
<p><span style="font-family:Arial; font-size:10pt"><sup>18</sup>Management of the project will use regular weekly teleconferences, to ensure that all aspects are proceeding according to the project plan. In addition, we will fulfill the legal requirements for collaboration agreements and the formal reporting requirements from JISC as part of WP5.
</span></p>
<p><span style="font-family:Arial; font-size:10pt"><sup>19</sup>To ensure maximum community and public engagement in this proposal, all appropriate documents will be posted using the k-blog environment in addition to those locations specified by JISC, except where that information is withheld under normal FOI rules. 
</span></p>
<p><span style="font-size:10pt"><span style="font-family:Arial"><sup>20</sup>Finally, we will gather and <strong>collate statistics</strong> on the use of these articles as measures of impact; directly in terms of page views from the underlying k-blog platform; indirectly from incoming links (both those using trackbacks, and those discovered using Web searching tools) and comments; and finally through secondary indicators such as Twitter and email communications. These statistics will also be made <strong>publicly</strong> available where appropriate</span>.
</span></p>
<h3>2.6 Timetable
</h3>
<p>
 </p>
<div>
<table style="border-collapse:collapse" border="0">
<colgroup>
<col style="width:124px"/>
<col style="width:72px"/>
<col style="width:80px"/>
<col style="width:62px"/>
<col style="width:281px"/></colgroup>
<tbody valign="top">
<tr>
<td style="padding-left: 7px; padding-right: 7px; border-top:  solid white 1.0pt; border-left:  solid white 1.0pt; border-bottom:  solid white 3.0pt; border-right:  solid white 1.0pt">
<p><span style="font-family:Arial">Name</span> </p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  solid white 1.0pt; border-left:  none; border-bottom:  solid white 3.0pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">Start</span> </p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  solid white 1.0pt; border-left:  none; border-bottom:  solid white 3.0pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">End</span> </p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  solid white 1.0pt; border-left:  none; border-bottom:  solid white 3.0pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">Staff</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  solid white 1.0pt; border-left:  none; border-bottom:  solid white 3.0pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">Notes</span> </p>
</td>
</tr>
<tr>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid white 1.0pt; border-bottom:  solid white 0.75pt; border-right:  solid white 3.0pt">
<p><span style="font-family:Arial">WP1</span> </p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 1.0pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">1/3/2011</span> </p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 1.0pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">30/9/2011</span> </p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 1.0pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">DS, PL</span> </p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 1.0pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">Maintenance of k-blog infrastructure</span> </p>
</td>
</tr>
<tr>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid white 1.0pt; border-bottom:  solid white 0.75pt; border-right:  solid white 3.0pt">
<p><span style="font-family:Arial">WP2</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 0.75pt; border-right:  solid white 0.75pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">1/3/2011</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 0.75pt; border-right:  solid white 0.75pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">31/7/2011</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 0.75pt; border-right:  solid white 0.75pt"> </td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 0.75pt; border-right:  solid white 1.0pt"> </td>
</tr>
<tr>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid white 1.0pt; border-bottom:  solid white 0.75pt; border-right:  solid white 3.0pt">
<p><span style="font-family:Arial">-WP2.1</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 1.0pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">1/3/2011</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 1.0pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">1/4/2011</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 1.0pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">ALL</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 1.0pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">Crowdsourcing of articles</span></p>
</td>
</tr>
<tr>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid white 1.0pt; border-bottom:  solid white 0.75pt; border-right:  solid white 3.0pt">
<p><span style="font-family:Arial">-WP2.2</span> </p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 0.75pt; border-right:  solid white 0.75pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">1/4/2011</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 0.75pt; border-right:  solid white 0.75pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">31/7/2011</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 0.75pt; border-right:  solid white 0.75pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">ALL</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 0.75pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">Content negotiation and creation</span></p>
</td>
</tr>
<tr>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid white 1.0pt; border-bottom:  solid white 0.75pt; border-right:  solid white 3.0pt">
<p><span style="font-family:Arial">-WP2.3</span> </p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 1.0pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">1/4/2011</span> </p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 1.0pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">30/9/2011</span> </p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 1.0pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">ALL</span> </p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 1.0pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">Articles reviewed and published</span></p>
</td>
</tr>
<tr>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid white 1.0pt; border-bottom:  solid white 0.75pt; border-right:  solid white 3.0pt">
<p><span style="font-family:Arial">WP3</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 0.75pt; border-right:  solid white 0.75pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">1/3/2011</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 0.75pt; border-right:  solid white 0.75pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">30/9/2011</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 0.75pt; border-right:  solid white 0.75pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">ALL</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 0.75pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">Outreach and engagement</span></p>
</td>
</tr>
<tr>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid white 1.0pt; border-bottom:  solid white 0.75pt; border-right:  solid white 3.0pt">
<p><span style="font-family:Arial">WP4</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 1.0pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">1/3/2011</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 1.0pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">30/9/2011</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 1.0pt; border-right:  solid white 1.0pt"> </td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 1.0pt; border-right:  solid white 1.0pt"> </td>
</tr>
<tr>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid white 1.0pt; border-bottom:  solid white 0.75pt; border-right:  solid white 3.0pt">
<p><span style="font-family:Arial">WP4.1</span> </p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 0.75pt; border-right:  solid white 0.75pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">1/3/2011</span> </p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 0.75pt; border-right:  solid white 0.75pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">31/7/2011</span> </p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 0.75pt; border-right:  solid white 0.75pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">ALL</span> </p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 0.75pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">In-house content generation</span></p>
</td>
</tr>
<tr>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid white 1.0pt; border-bottom:  solid white 0.75pt; border-right:  solid white 3.0pt">
<p><span style="font-family:Arial">WP4.2</span> </p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 1.0pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">1/4/2011</span> </p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 1.0pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">30/9/2011</span> </p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 1.0pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">ALL</span> </p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 1.0pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">In-house articles review and publication</span></p>
</td>
</tr>
<tr>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid white 1.0pt; border-bottom:  solid white 1.0pt; border-right:  solid white 3.0pt">
<p><span style="font-family:Arial">WP5</span> </p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 1.0pt; border-right:  solid white 0.75pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">1/3/2011</span> </p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 1.0pt; border-right:  solid white 0.75pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">30/9/2011</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 1.0pt; border-right:  solid white 0.75pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">PL</span> </p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid white 1.0pt; border-right:  solid white 1.0pt">
<p><span style="color:black; font-family:Arial; font-size:10pt">Project management and JISC compliance</span> </p>
</td>
</tr>
</tbody>
</table>
</div>
<p>
 </p>
<h3>2.7 Deliverables
</h3>
<p>
 </p>
<p><span style="color:black; font-family:Arial; font-size:10pt"><sup>21</sup>A high-quality body of <strong>content</strong>, consisting of a series of articles from <strong>multiple</strong> authors; describing different topics fitting within the theme of &#8220;Knowledge in Biology&#8221;. 40 of these articles will be authored in-house. A further 200 will be sourced with consultancy payment. We anticipate many others will come from <strong>crowd-sourced</strong>, enthusiastic authors, engaged with the process. 
</span></p>
<p>
 </p>
<p><span style="color:black; font-family:Arial; font-size:10pt"><sup>22</sup>A website, based on the k-blog platform, that delivers this content. 
</span></p>
<p>
 </p>
<p><span style="color:black; font-family:Arial; font-size:13pt"><strong>2.8 Project management arrangements
</strong></span></p>
<p>
 </p>
<p><span style="color:black; font-family:Arial; font-size:10pt"><sup>23</sup>The project will be managed from Newcastle University; the primary management will be from Dr Lord, who will be responsible for:
</span></p>
<p>
 </p>
<p><span style="color:black; font-family:Arial; font-size:10pt">    - Developing Project Management Plans;
</span></p>
<p><span style="color:black; font-family:Arial; font-size:10pt">    - Ensuring that the Project work package objectives are met;
</span></p>
<p><span style="color:black; font-family:Arial; font-size:10pt">    - Prioritising and reconciling conflicting opportunities;
</span></p>
<p><span style="color:black; font-family:Arial; font-size:10pt">    - Reporting and collaborating with JISC programme Manager;
</span></p>
<p><span style="color:black; font-family:Arial; font-size:10pt">    - Dissemination of community content.
</span></p>
<p>
 </p>
<p><span style="color:black; font-family:Arial; font-size:10pt"><sup>24</sup>Project progress will be evaluated through scheduled, short, &#8220;stand-up&#8221; meetings on a weekly basis, conducted face-to-face, via Skype or phone as appropriate. Primary unscheduled communication will be via public mailing list, ensuring maximum visibility and openness. User consultation will be via public mailing list. Close tracking of requests for content and payment of authors is essential, and transparent procedures will be put in place for this. All staff are associated with several other projects and duties (research, research support, teaching and training), and are responsible for managing these independent workloads. All have experience with the k-blog platform and process.
</span></p>
<p>
 </p>
<p><span style="font-family:Arial; font-size:10pt"><span style="color:black"><sup>25</sup>We have formed a small, unpaid advisory committee from recognised experts in the field. They will be invited to give feedback on the topics covered at 2, 4 and 6 months into the project; this will help to ensure an even and representative coverage of the area, that is not overly biased by the particular interests of the staff on the project.  </span>Mark Musen (Stanford), Chris Rawlings (BBSRC Rothamsted)<span style="color:black">
			</span>and David Shotton (Oxford) have all agreed to be our advisory board.<span style="color:black">
			</span></span></p>
<h3>2.9 Risks
</h3>
<p>
 </p>
<p><span style="color:black; font-family:Arial; font-size:10pt"><sup>26</sup>Staff Risk – as with all projects, loss of staff could negatively impact on this project; however, all staff are on permanent contracts, have long histories in research, so this is less likely. Additionally, the nature of the workload means all staff would be able to cover duties relating to sourcing and generating community content, we limit the risk should a single person leave.
</span></p>
<p>
 </p>
<p><span style="color:black; font-family:Arial; font-size:10pt"><sup>27</sup>Lack of community engagement – the strength of this proposal depends on contributions from many different authors, generating new, novel and, currently, unavailable content. However, there is also a risk that the community will not wish to contribute. We have limited this risk by offering to pay people consultancy rates – an unusual reward within academic research; however, we will only need to commit funds following the submission of the content, so should authors not deliver, we will reallocate these funds. Should we still find it hard to solicit contributions, we will increase the rates per article. 
</span></p>
<p>
 </p>
<p><span style="color:black; font-family:Arial; font-size:10pt"><sup>28</sup>Technology dependencies &#8211; Content will be disseminated in the form of k-blogs, and thus there is a dependency on the k-blog platform. It is already suitably developed and packaged. The k-blog platform is a publishing framework only; it is not essential for the authoring of articles. This limits the scope of the risk. Content could be published independently of the k-blog platform, with only a small loss in the feature set. Additionally, content could be relocated elsewhere at any time; it would retain its value outside of the k-blog platform. With the archival agreement under the Sustainability section, archives of the original KiB content will always be available.
</span></p>
<p>
 </p>
<h3>2.10 IPR position
</h3>
<p>
 </p>
<p><span style="color:black; font-family:Arial; font-size:10pt"><sup>29</sup>It is essential that content is released with as few restrictions as possible on re-use and re-purposing, but authors must be allowed to maintain credit associated with the original work, or they are unlikely to contribute. Project members agree to release their work under a <strong>Creative Commons</strong> Attribution-NonCommercial ShareAlike 3.0 Unported License (CC BY-NC-SA), which allows re-use and modification for non-commercial purposes with attribution. This is in line with the <strong>JISC</strong>
			<strong>Model Licence</strong>. Authors invited to submit articles will be allowed to choose a Creative Commons licence of their own but will be strongly encouraged to use as permissive a licence as possible. Choice is offered to allow considerations of different institutional policies on published content. Public domain submissions will also be accepted to accommodate US government employees; these submissions will be uncommissioned. 
</span></p>
<p>
 </p>
<h3>2.11 Sustainability 
</h3>
<p>
 </p>
<p><span style="color:black; font-family:Arial; font-size:10pt"><sup>30</sup>To maintain the persistence of the online resources beyond the end of the project, documents produced by project staff and KiB contributors will be publically available and clearly licensed. The k-blog site and sub-domains are already archived by the<strong> UK Web Archive</strong>, in which JISC is an active partner. The Digital Curation Centre will be asked to provide strategies for long-term database archiving.
</span></p>
<p>
 </p>
<h3>2.12 Staff Recruitment
</h3>
<p>
 </p>
<p><span style="color:black; font-family:Arial; font-size:10pt"><sup>31</sup>All staff are already in post.
</span></p>
<p>
 </p>
<h3>3 Impact
</h3>
<p>
 </p>
<p><span style="font-family:Arial; font-size:10pt"><span style="color:black"><sup>32</sup>Our key beneficiaries are the community of researchers working to develop knowledge in biology. Specifically this focuses on groups involved in data standards, linked data, knowledge in synthetic biology and statistical analysis of biological data. The needs to this community are clearly demonstrated from our Ontogenesis experiment, which is currently receiving 1000 page views per month for a small number of articles. Simple question and answer websites such as <a href="http://biostar.stackexchange.com/">http://biostar.stackexchange.com/</a>, receive over 2k page views per week; however, there is a gap between this and more formal knowledge.
</span></span></p>
<p>
 </p>
<p><span style="color:black; font-family:Arial; font-size:10pt"><sup>33</sup>We will generate <strong>statistical</strong> information, using the k-blog platform as a clear metric of impact; for freely available, reusable and web-delivered content indicators such as page views are well recognised, and the main form of <strong>impact assessment</strong>. Both natively, and through tools such as Google analytics, the k-blog platform can provide comprehensive and detailed feedback on access of individual articles. We will also exploit secondary impact measures, including Twitter through appearance of suitable hashtags; comments and trackbacks to articles on KiB; and, finally, links to KiB as provided by web search. 
</span></p>
<p>
 </p>
<p><span style="color:black; font-family:Arial; font-size:10pt"><sup>34</sup>We will seek to increase impact through a number of activities in addition to normal academic channels.  First, we will invite contributions from well-known members of the scientific blogging community that should result in secondary readership.  Second, we will invite contributions on relevant topics that have become of<strong> recent public interest</strong>. Thirdly we will monitor article popularity; for areas that prove to be of interest or are <strong>controversial</strong> we will seek to commission additional content. 
</span></p>
<p>
 </p>
<h3>4 Partnership and dissemination
</h3>
<p>
 </p>
<p><span style="color:black; font-family:Arial; font-size:10pt"><sup>35</sup>Internal engagement of core project members, and the wider community of researchers crowd-sourced to supply content will be via the mailing list, after initial approaches are made. The plans for content generation are further outlined in WP3 and WP4. Content generation will allow further interaction with more disparate groups (content consumers), who will be encouraged to engage through the k-blog process and the project mailing lists. The advisory committee will be able to ensure that our engagement with the content-producing community is representative of the community. The nature of the k-blog process means dissemination is intrinsic to content generation. 
</span></p>
<h3><span style="font-size:10pt"><sup>36</sup>Project members are on the existing JISC funded Knowledge Blog grant in the &#8220;Managing Research Data&#8221; programme.  We will approach individuals with funding from this and other programmes, requesting articles describing the value of these projects to biologists. We will, of course, also be pleased if JISC programme managers wish to contribute articles to this knowledge in biology resource.</span>
	</h3>
<h3>6 Previous experience of the Project Team
</h3>
<p>
 </p>
<p><span style="color:black; font-family:Arial; font-size:10pt"><sup>37</sup>Dr Phillip Lord is a Lecturer of Computing Science at Newcastle University. He has a PhD in yeast genetics from University of Edinburgh, after which he moved into bioinformatics. He is well known for his work on ontologies in biology, as well as his contributions to eScience beginning with his role as a RA on the myGrid project. Since his move to Newcastle, he has been an investigator on there more eScience projects; CARMEN, ONDEX and InstantSOAP, as well as maintaining an active engagement in standards development (OBI, MIGS, MIBBI), and publishing on the fundamentals of ontology design. He was an active participant in the Ontogenesis network, and is currently leading the JISC funded Knowledge Blog project. He is an active blogger and developer.
</span></p>
<p>
 </p>
<p><span style="color:black; font-family:Arial; font-size:10pt"><sup>38</sup>Dr Robert Stevens is a reader in Bioinformatics in the Bio and Health Informatics group at the University of Manchester. His main areas of research are in the development and use of semantics within the life sciences. This is blended with the use of e-Science platforms to gather and manage the data and knowledge of the life sciences. He was PI on the Ontogenesis network that ran the meetings for the first Knowledge Blog. He is or has been a co-investigator on the myGrid and myExperiment grants that will provide both content and technical input to this project. As well as the JISC funded myExperiment project, Stevens was an investigator on the JISC funded CO-ODE project that developed Protégé 4. On the back of this, Stevens has led the OWL training activities at Manchester that has directly fed in to the Ontogenesis Knowledge Blog. Stevens currently leads content development for the JISC Knowledge Blog grant.
</span></p>
<p>
 </p>
<p><span style="color:black; font-family:Arial; font-size:10pt"><sup>39</sup>Dr Daniel Swan has a PhD in developmental biology and continued to work in developmental biology as a post-doctoral researcher before moving into bioinformatics in 2001. Subsequent positions included working for Bart&#8217;s and the London Genome Centre and the Centre for Hydrology and Ecology in informatics driven roles dealing with large, distributed biological datasets generated by large user communities. Currently the manager of the Newcastle University Bioinformatics Support Unit, he leads a small team aiding biological researchers generate, capture, store and analyse their digital data. His interdisciplinary background means he has grounding in both computer and biological sciences and is comfortable working on CS focused projects (CARMEN, InstantSOAP, Bio-Linux). He has been most recently involved in the JISC Knowledge Blog grant, providing technical support and engagement with microarray community.
</span></p>
<p><span style="color:black; font-family:Arial; font-size:10pt">
		</span> </p>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1882 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2011/03/1882/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Separating the Content from the Medium</title>
		<link>http://www.russet.org.uk/blog/2011/02/separating-the-content-from-the-medium/</link>
		<comments>http://www.russet.org.uk/blog/2011/02/separating-the-content-from-the-medium/#comments</comments>
		<pubDate>Fri, 18 Feb 2011 17:21:55 +0000</pubDate>
		<dc:creator>Phil Lord</dc:creator>
				<category><![CDATA[Science]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1872</guid>
		<description><![CDATA[I was entertained by a couple of articles recently, one from PLoS Blogs and one from Ed Yong both bemoaning the low social status of bloggers at least in some peoples minds. As the front page of the PLoS blog says: Blogging is just one of the outlets science journalists use. It&#8217;s about time we [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1872">
<p> <a name="preamble"></a> 
<p>I was entertained by a couple of articles recently, one from <a href="http://blogs.plos.org/speakeasyscience/2011/02/16/in-defense-of-science-blogs-yes-again/">PLoS Blogs</a> and one from <a href="http://edyong.posterous.com/i-think-you-have-all-you-need-for-a-blog">Ed Yong</a> both bemoaning the low social status of bloggers at least in some peoples minds. As the front page of the PLoS blog says:</p>
<blockquote><p>Blogging is just one of the outlets science journalists use. It&#8217;s about time we separate the person from the medium.</p>
<p align="right">
</blockquote>
<p>Of course, I agree with this. There is some excellent material floating around the blogosphere. But at the same time, there is a subtle irony in all of this. Both of these authors, I think make a similar confusion about the medium. For instance,</p>
<blockquote><p>my point is that the world of science blogging is populated with some of the best journalists I know.</p>
<p align="right"> <em>PLoS Blogs</em><br /> &#8212; Deborah Blum </p>
</blockquote>
<p>At the moment, within science, blogging is still see as an appropriate place for Journalism about science, or in some cases scientists describing their personal experience within science. I don&#8217;t denigrate this in anyway, but I think to some extent it misses the point. Science blogging should be about scientists. Many of use now use blogging as part of doing science itself; take <a href="http://themindwobbles.wordpress.com">Allyson Lister&#8217;s</a> excellent and extensive meeting or <a href="http://themindwobbles.wordpress.com/2010/04/29/henning-hermjakob-psicquic-and-envision/">seminar</a> notes. Or <a href="http://blog.fuzzierlogic.com/">Simon Cockell&#8217;s</a> <a href="http://blog.fuzzierlogic.com/archives/425">experience</a> sharing. Or my own move to just blogging my own <a href="http://www.russet.org.uk/blog/2010/07/realism-and-science/">papers</a> and <a href="http://www.russet.org.uk/blog/2010/08/a-new-grant-for-knowledgeblog/">grants</a>. And the occasional technological <a href="http://www.russet.org.uk/blog/2011/02/the-problem-with-dois/">rant</a>.</p>
<p>The blog is not the point here, it is just the tool that we are using to advance our science. This is also the point of my <a href="http://knowledgeblog.org/">knowledgeblog</a> project; it is not about adding to the blogosphere, it is not about using WordPress for science. It is about better, faster, cheaper scientific communication.</p>
<p>Ironically, it would help to solve <a href="http://edyong.posterous.com/">Ed Yong&#8217;s</a> problem as well. In future, maybe he won&#8217;t have to ask for the paper, because it will be on the web, with all the data, for all the world to see.</p>
<p>After this diversion into journalism, this blog will now resume normal service, as a place to describe my science.</p>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1872 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2011/02/separating-the-content-from-the-medium/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why Realism is Wrong</title>
		<link>http://www.russet.org.uk/blog/2010/09/why-realism-is-wrong/</link>
		<comments>http://www.russet.org.uk/blog/2010/09/why-realism-is-wrong/#comments</comments>
		<pubDate>Tue, 07 Sep 2010 03:19:46 +0000</pubDate>
		<dc:creator>Phil Lord</dc:creator>
				<category><![CDATA[Science]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1779</guid>
		<description><![CDATA[Following the publication of a number of papers, Gary Merrill, Michel Dumontier and Robert Hoehndorf (also as PDF) and myself (also on PLoS One), there has been an enormous amount of discussion on what is realism in ontology building, and whether it appropriate for use in scientific ontology building. As I have documented previously, I [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1779">

<p>Following the publication of a number of papers, <a href="http://iospress.metapress.com/content/j3324564p5l33863/?p=c9a7d6a6826845258807084d2693dbad&amp;pi=0">Gary Merrill</a>, <a href="http://portal.acm.org/citation.cfm?id=1804755">Michel Dumontier and Robert Hoehndorf</a> (also as <a href="http://leechuck.de/pub/real.pdf">PDF</a>) and <a href="http://www.russet.org.uk/blog/2010/07/realism-and-science/">myself</a> (also on <a href="http://dx.plos.org/10.1371/journal.pone.0012258">PLoS One</a>), there has been an enormous amount of discussion on what is realism in ontology building, and whether it appropriate for use in scientific ontology building. As I have documented <a href="http://www.russet.org.uk/blog/2010/03/leaving-bfo-discuss/">previously</a>, I had now left the BFO discuss mailing list, and more latter OBO discuss, as I felt that these discussions have reached a finishing point. In this post, I want to spell out clearly my reasons why I think that it is not appropriate. I want to try and avoid re-iterating the positions in my paper, and earlier postings, as well as provide a direct answer to <a href="http://ontogeek.wordpress.com/2010/04/24/realism-really/">David Sutherland</a> who has posted why he is a realist.</p>
<hr /> 
<h2><a name="_what_is_realism"></a>What is realism?</h2>
<p><a name="philosophy"></a>Sadly, I need to start with a philosophical digression. At heart, I am not interested in philosophy, nor I guess are many in the bio-ontologies community. Those in this camp can safely skip this to the <a href="#pragmatic">next section</a>.</p>
<p>At heart, realism is a metaphysical interpretation of the ontology. How are we to interpret the relationship between, for example, the ontology term <tt>Human</tt>, and the things that exist in the real world. Realism asserts that the ontology term refers to a Universal, that exists in its own right, but not separately from the instances to which it refers.</p>
<p>Personally, I do not find these assertions of reality or truth very helpful. David Sutherland <a href="http://ontogeek.wordpress.com/2010/04/24/realism-really/">suggests</a> that:</p>
<blockquote><p>One possible reason is a failure of nerve. Many people become quite nervous at talk of truth and reality.</p>
<p align="right"> &#8212; David Sutherland </p>
</blockquote>
<p>In my case, this is true, and it stems from my history. Like many people learning science, when I first heard of Mendels laws, or the exceptional weird behaviour of light, my initial response was that they were not real, just part of the mathematical model that describes the experimental results. Later on, though, I realised that I had the same worries about other concepts. When I was first told that a table holding a weight was asserting a force on the weight, I didn&#8217;t believe it; after all, when I support a weight it costs me effort to do so, but the table was just sitting there. <a name="youth"></a> Many years before this, I didn&#8217;t believe the idea that I was surrounded by invisible things could gasses, although I did realise that it was a good way of explaining the wind. Eventually, however, I became so used to manipulating force, or a gene in mathematical equations of physics or genetics, I just stopped worrying about it.</p>
<p>In his <a href="http://iospress.metapress.com/content/j3324564p5l33863/?p=c9a7d6a6826845258807084d2693dbad&amp;pi=0">paper</a>, Gary Merrill argues that, in practice, we don&#8217;t need a metaphysical interpretation anyway. I tend to agree. Consider this quote:</p>
<blockquote><p>The next question was &#8211; what makes planets go around the sun? At the time of Kepler some people answered this problem by saying that there were angels behind them beating their wings and pushing the planets around an orbit. As you will see, the answer is not very far from the truth. The only difference is that the angels sit in a different direction and their wings push inward.</p>
<p align="right"> <em>Character of Physical Law</em><br /> &#8212; Richard Feynman </p>
</blockquote>
<p>Personally, I like to speak of models of data, rather than representations of reality. I find that talking of model reminds me that it is my job not to support models but to break them. I do not see the point of the rebadging of commonly used terms such as model, with more complex ones such as &#8220;representation of reality&#8221; (this rebadging is a theme of realism to which I will return). But the bottom line, though, it doesn&#8217;t really matter. The statement that \(g \propto 1/r^2\) is the same as \(F_{wings of angels} \propto 1/r^2\). So long as we agree that the angels behave in a precise, predictable way, there is no deep reason to distinguish between the two, except for simple pragmatism: &#8220;gravity&#8221; is shorter and easier to say than &#8220;the wings of angels&#8221;.</p>
<hr /> 
<h2><a name="_what_realism_is_not"></a>What realism is not</h2>
<p>Realism has chosen wisely in its choice of name. Most scientists believe in reality so, when faced with realism vs conceptualism, their gut feeling is that the former will be right. They believe in a mind-independent reality so, therefore, conceptualism must be wrong. Now others have argued convincingly that this is an inaccurate interpretation of conceptualism, so I will not repeat the discussion here, but instead look at a more specific interpretation, that realism means building ontologies on the basis of experimental evidence. This conflation of &#8220;evidence-based ontologies&#8221; with realism can be seen from <a href="http://ontogeek.wordpress.com/2010/04/24/realism-really/">David Sutherland</a>.</p>
<blockquote><p>The results of those inferences will be judged by how they match reality. An inference that is demonstrably false indicates a problem with the initial assertions (or with the inference mechanism).</p>
<p align="right"> &#8212; David Sutherland </p>
</blockquote>
<p>Similarly, <a href="http://sourceforge.net/mailarchive/message.php?msg_name=C89D4D14.114E0%25judith.blake%40jax.org">Judy Blake</a> makes the same conflation</p>
<blockquote><p>I strongly support the realist approach that facilitates the use of the ontology for science discovery. We represent in the ontologies what we know with some degree of certainty.</p>
<p align="right"> &#8212; Judy Blake </p>
</blockquote>
<p>Of course, both of these positions are reasonable&#8201;&#8212;&#8201;we should judge ontologies by how well their inferences fit our experimental data and, further, for reference ontologies, we should represent knowledge for which we have very good evidence. But this is not realism. This can be shown with a straight-forward argument.</p>
<p>While the definition of &#8220;science&#8221; is open to question, a reasonable working definition would be that &#8220;Science is the interpretation of experimental data&#8221;. The idea that anyone who is not a realist, therefore, believes that we should not base ontologies on our experimental data, or what we know is either uncharitable or wrong. It also, however, undermines the notion that realism is a useful methodology. If science is about modelling experimental data, while realism is a methodology for building ontologies based on experimental data, then &#8220;realism-based scientific ontology&#8221; is tautological; &#8220;realism-based&#8221; adds nothing at all to the statement, except to make it longer. In short, returning to the earlier theme, we have rebadged &#8220;scientific ontology&#8221; as &#8220;realism-based ontology&#8221;.</p>
<p>Believing in reality does not make you a realist. Believing that ontologies should be based on evidence does not make you a realist; it just means you are a scientist.</p>
<hr /> 
<h2><a name="pragmatic"></a>What is a pragmatic implications of realism?</h2>
<p>One of the difficulties in addressing the pragmatic implications of realism, is that many of the conclusions that are made do not seem to stem from the underlying philosophy. This makes it hard to judge what the implications of realism are in a given situation. The end result has to be to look at how realism has been practiced in, er, reality, ignoring the philosophical underpinning. I&#8217;ve taken this approach here.</p>
<p>The first time that I heard to realism was at Glasgow ISMB in 2004. One theme that came out here was the notion that all ontologies should be single inheritance; because in reality things can only be a kind of one other thing. BFO follow this, and this position was supporting in many discussions on BFO-discuss. Ironically, though, with terms such as &#8220;Object Part&#8221; any ontology that uses BFO is hard pressed to do likewise. I was a little surprised to be asked if I understood the strategy of asserting single inheritance and inferring the rest; surprised because a) this strategy is normalisation pattern from Alan Rector with whom I have worked for many years and b) because it represents a complete change. Normalisation results in a poly hierarchy&#8201;&#8212;&#8201;that some subsumption is inferred and some asserted is a engineering decision, not a question of underlying philosophy.</p>
<p>That realism is, apparently, capable of supporting such a shift is rather worrying. This cannot be put down to falsifiability&#8201;&#8212;&#8201;the notion that ontologies can be wrong and can change&#8201;&#8212;&#8201;as this is a change at the metaphysical level. It suggests that, in practice, realism is disconnected from its philosophical underpinning. It also suggests that realism is capable of justifying two quite differing positions&#8201;&#8212;&#8201;in short, it suggests that realism actually has very little explanatory power. Currently, the realist answer to this, is that the asserted relationships represent universals; but as there is no clear assay for what this means, I feel this doesn&#8217;t help. My own experience is that determining a privileged axis of inheritance is not, in most cases, possible. Ontologies are fundamentally multiply-inherited; normalisation is a simple engineering decision, which removes the load of maintaining this from the human to the reasoner.</p>
<p>Another long held tenet of realism is the assertion that the use of <tt>not</tt> represents bad ontology. Statements such as <tt>Fly all has_part not Wing</tt> are asserting a relationship with entities that don&#8217;t exist. However, many people find this sort of modelling useful. It, therefore, was a surprise to find that following a lot of careful thought that realism does allow <tt>Fly lacks Wing</tt> is okay. But winding <tt>not</tt> into the relationship in this way has a number of problems. First, it requires an alteration at the logical level of the ontology; the relationship has to be between the instance and universal purely to satisfy realism, because the universal really exists. In doing so, a special case instance-universal relationship is required. Secondly, the semantics of this relationship are now hidden, rather than being explicit in the ontological layer; the reader has to understand that some relationships are effectively positive, and some are negative. It&#8217;s unclear why it is necessary to jump over these hurdles, when it would have been far simpler to just use a <tt>not</tt> construction.</p>
<p>So, does realism produce good ontology. I have already spoken about this in my <a href="http://www.russet.org.uk/blog/2010/07/realism-and-science/">paper</a>, but it seems fairly clear that it does not. BFO, rather like myself as a <a href="#youth">youth</a>, BFO is mass-centric: waves, energy, force, entropy all have no place. It makes unnecessary and meaningless distinctions&#8201;&#8212;&#8201;site and spatial region (one is a region of space wrt to an observer, one is an absolute region of space). It also encompasses some outright howlers, including the fact that a spatial region cannot have a length.</p>
<p>Does realism encourage good practice? Again, I think in many cases, it does not. Firstly, it elevates &#8220;reality&#8221; above all else; so any distinction that can be made should be made, because that is reality right? Taken to the extreme this results in overly complex ontologies, suffering from analysis paralysis. Just because we can make a distinction does not mean that we should, unless there is a good use case, and a clear reason why this distinction adds usefulness to the ontology. It also results in the use of overly complex, philosophical language, which is hard for those outside a small clique to understand; I do, now, understand the definitions in BFO, but in many ways I wish that I didn&#8217;t. As a trivial example, the modification of the standard definition from (&#8220;A is a B that has R&#8221;) to enable the distinction between defined and primitive classes (&#8220;A =def B that has R&#8221;). This reduces the readability of definitions. Readability is important; we should be willing to compromise precision in its favour.</p>
<p>Likewise, I worry when I see definitions such as</p>
<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10">
<tr>
<td><!-- Generator: GNU source-highlight 3.1.3 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite -->
<pre><tt><b><font color="#0000FF">Class</font></b>: planned_process

<font color="#990000">SubClassOf:</font>
       realizes some (is_concretization_of some ('plan specification'
            <b><font color="#000080">and</font></b> has_part some 'objective specification'))</tt></pre>
</td>
</tr>
</table>
<p>or even</p>
<table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10">
<tr>
<td><!-- Generator: GNU source-highlight 3.1.3 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite -->
<pre><tt><b><font color="#0000FF">Class</font></b>: glucose_tolerance_test

<font color="#990000">SubClassOf:</font>
         assay,
         has_specified_output some ('information content entity'
                              <b><font color="#000080">and</font></b> is_proxy_for some 'insulin resistance')
         realizes some (is_concretization_of some 'independent variable
                      specification')
         realizes some (is_concretization_of some 'time series design')
         achieves_planned_objective some 'biological feature identification
                       objective'
         achieves_planned_objective some 'assay objective'
         has_part some ('data transformation'
                        <b><font color="#000080">and</font></b> has_specified_input some 'measurement datum'
                        <b><font color="#000080">and</font></b> has_specified_output some graph)
         has_part some ('administering substance in vivo'
                        <b><font color="#000080">and</font></b> has_specified_input some glucose)</tt></pre>
</td>
</tr>
</table>
<p>The distinctions being made here, and the properties <tt>realizes</tt> and <tt>is_concretization_of</tt> stem from realism, and more specifically from the <tt>generically dependent continuant</tt>. With its mass-centric bias, BFO 1.0 couldn&#8217;t represent many entities, such as information, a book or this blog post. So GDC was added. But a dependent continuant is a thing that exists dependent on another, that comes into existent with the other, and disappears again. GDC shares none of these characteristics. A book does not appear when it is first printed, nor does it disappear when the paper breaks down, or the ink fades. But it was not possible to add something like immaterial continuant, because it had to depend on some mass. The convoluted nature of the ontology here exists to satisfy the requirements of realism; not the ontologists, developers or users.</p>
<hr /> 
<h2><a name="_the_alternative"></a>The alternative</h2>
<p>So, what alternatives are there. I offer no alternative metaphysics, because, as described <a href="#philosophy">earlier</a>, I neither care, nor do I feel a metaphysical interpretation is necessary. We are building ontologies in biomedicine for many reasons&#8201;&#8212;&#8201;but mostly they revolve around one thing&#8201;&#8212;&#8201;we need a structure to hold our knowledge, our theories and hypothesis which is computationally amenable because there are too many to do by hand. It&#8217;s an engineering task and this is what I care about.</p>
<p>Ontology building, I would argue, is a hybrid, sitting somewhere between software engineering and statistical modelling. We need to borrow from the best of these worlds, to produce a good engineering methodology.</p>
<p>Actually, we already have borrowed from software engineering; OBO, for example, advises mailing-lists, trackers, version control, releasing early and often, tight user feedback. All of these stem directly from the agile techniques that have come to the fore in the last decade; all of these have been part of ontology building since well before realism appeared on the scene.</p>
<p>I think we need to take more account of use cases, or their light-weight manifestations, with &#8220;user stories&#8221;. Realism, and the philosophical reflection that it inspires, to me seems to bear more in common with the waterfall methodologies of an earlier era; thinking carefully earlier to avoid having to fix things later sounds a good idea, but history suggests that in many cases, the thinking simply delays the point at which you discover you have to fix things anyway.</p>
<p>But agile software methodologies do not have all the answers; ontologies are not software. The key difference is that ontologies lack test frameworks. While it is sometimes possible to automatically <a href="http://dx.doi.org/10.1093/bioinformatics/btl208">test</a> our ontology against the experimental data, in most cases it is not. I think this is where we need to borrow more from statistics. For instance, one rule that from statistical modelling is: do not add a new variable to a model, even if it increases the goodness of fit to the data, unless the increase is statistically significant. In ontological terms, this can be translated: just because you can make a distinction does not mean you should.</p>
<p>In his 2005 <a href="http://dx.doi.org/10.1016/j.jbi.2005.08.005">paper</a>, Ingvar Johansan talks about the fallacy of mixing use and mention. As example he presents this (now changed) section of GO:</p>
<table border="0" bgcolor="#e8e8e8" width="100%" style="margin:0.2em 0;"> 
<tr>
<td style="padding:0.5em;">
<pre style="margin:0; padding:0;">Gene_Ontology
  part_of
Biological process
  is_a
physiological process</pre>
</td>
</tr>
</table>
<p>The problem with this is that &#8220;biological process&#8221; is overloaded, referring both a biological process and the ontology term biological process. The link was originally put in place for engineering reasons; I used it, for instance, in the work for my own <a href="http://bioinformatics.oupjournals.org/content/19/10/1275.abstract?ijkey=35y5NKFGceWLQ&amp;keytype=ref">paper</a> from 2002. I knew that semantic similarity (how closely annotated two genes are) correlates with sequence similarity; the question is does this work better, if we consider all of GO, or the three aspects independently. The answer is the latter; in short, <tt>Biological Process part_of Gene Ontology</tt> has no explanatory power. So, is this an example of realism demonstrating an ontological problem; sadly not. Consider this, slightly changed ontology:</p>
<table border="0" bgcolor="#e8e8e8" width="100%" style="margin:0.2em 0;"> 
<tr>
<td style="padding:0.5em;">
<pre style="margin:0; padding:0;">Universe
  part_of
Biological process
  is_a
physiological process</pre>
</td>
</tr>
</table>
<p>According to realism, this simple rebadging of the top-level term has fixed the problem, because all biological processes really are part of the universe. But computationally, we have the same ontology, so we still have a term with no explanatory power. In short, the uses and the use cases of our ontology define the best ontology; the experimental data is only a start.</p>
<hr /> 
<h2><a name="_conclusion"></a>Conclusion</h2>
<p>I tend to agree with <a href="http://sourceforge.net/mailarchive/message.php?msg_name=4C77B0D1.2070506%40ebi.ac.uk">Nicolas le Novere</a> that this:</p>
<blockquote><p>is an endless discussion because this is specifically the fundamental divergence between two schools of thoughts, both respectable, and both consistent, but irreconcilable.</p>
<p align="right"> &#8212; Nicolas Le Novere </p>
</blockquote>
<p>I have written this post both as an answer to <a href="http://ontogeek.wordpress.com/2010/04/24/realism-really/">David Sutherland</a>, as a supplement to my <a href="http://dx.plos.org/10.1371/journal.pone.0012258">paper</a>, but most importantly as a way to remove myself from the discussion. I think, now, that with three papers on the issue, I can move on with what I want to do: use ontologies to help with the analysis of our data, and to increase our understanding of biology.</p>
<p>I do not expect that the significant momentum that realism has built up will be broken, but I do hope that it will cease to be advanced as proven best practice, to be considered the only correct way forward. If this has been achieved, then it will help to avoid the unfortunate situation that <a href="http://sourceforge.net/mailarchive/message.php?msg_name=20100827133117.27DE45B003B%40mweb1.acsu.buffalo.edu">some</a> actually want; a fork in the community. I think that this is a pity; in general, I tend to prefer OBO&#8217;s stated <a href="http://www.obofoundry.org/crit.shtml">principle</a> that &#8220;we would strive for community acceptance [&#8230;] rather than encouraging rivalry&#8221;.</p>
<p>There are so many agreements between the various sides of this argument: it is on these, the practical, pragmatic engineering decisions that we see in much of OBO and GO, and that we see in the original ten <a href="http://www.obofoundry.org/crit.shtml">principles</a> of OBO that we should build.</p>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1779 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2010/09/why-realism-is-wrong/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Latex to WordPress</title>
		<link>http://www.russet.org.uk/blog/2010/08/latex-to-wordpress/</link>
		<comments>http://www.russet.org.uk/blog/2010/08/latex-to-wordpress/#comments</comments>
		<pubDate>Thu, 26 Aug 2010 15:34:34 +0000</pubDate>
		<dc:creator>Phil Lord</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1740</guid>
		<description><![CDATA[LaTeX to WordPress Phillip Lord This post describes the process of posting to WordPress from a LaTeX source file, using tools generated as part of the Knowledgeblog project. 1 Introduction About a month ago, we managed to get funding from JISC for knowledgeblog; the idea is to turn a blog platform from something for light [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1740">
<div class="titlepage"> 
<h1>LaTeX to WordPress</h1>
<p>Phillip Lord</p>
</p></div>
<div class="abstract"> This post describes the process of posting to WordPress from a LaTeX source file, using tools generated as part of the Knowledgeblog project. </div>
<h1 id="sec:introduction">1 Introduction</h1>
<p>About a month ago, we managed to get funding from <a href="http://www.jisc.ac.uk">JISC</a> for <a href="http://www.knowledgeblog.org">knowledgeblog</a>; the idea is to turn a blog platform from something for light commentary into a framework for serious scientific publication. One of the key requirements for this is to fit in with peoples existing working practices; and for this, we need a good document creation environment. This means word and latex. I’ve been working mostly on the latter, and this post is the first outcome. It’s generated totally automatically from latex. This is an advance on my paper on <a href="http://www.russet.org.uk/blog/2010/07/realism-and-science/">realism</a> which was semi-automatically converted, with some hand editing of the HTML. </p>
<p>At the moment, the tool-chain is a little bit clunky, but it will improve! This is not meant to be an annoucement that all is ready, just an early alpha release and proof-of-principle. </p>
<h1 id="a0000000002">2 Implementation</h1>
<p>The implementation of these tool-chain uses three pieces of software: </p>
<dl class="description">  
<dt>latextowordpress: </dt>
<dd>
<p>This package, that I have written, uses <a href="http://plastex.sourceforge.net">plasTeX</a> to parse and render the latex into HTML. Most of the work is being performed by plasTeX out-of-the-box, although using a non-default configuration. Math-mode is being treated separately however, rather than using plasTeXs default image rendering approach. </p>
</dd>
<dt>blogpost:</dt>
<dd>
<p><a href="http://www.methods.co.nz/asciidoc/#_blogpost_weblog_client">blogpost</a> is being used to actually post the generated HTML onto the web. The HTML can also be cut-and-paste directly into wordpress, but blogpost is easier for me, as its the usual tool I use anyway (normally over asciidoc source). Blogpost is unmodified. </p>
</dd>
<dt>mathjax-latex: </dt>
<dd>
<p>This is a wordpress plugin, that I have written, which uses <a href="http://www.mathjax.org">MathJax</a> to render math-mode from the original latex in the browser. The plugin just injects the mathjax javascript headers into a post on-demand (i.e. only on posts with math-mode in them). </p>
</dd>
</dl>
<p>Currently, this is all held together with some dodgy makefiles; this will be improved in time. </p>
<p>The first and last of these tools are available from <a href="http://services.knowledgeblog.org/download/">knowledgeblog</a>. I’ve tested them on Ubuntu 10.04 and they are in alpha. Comments are welcome, to <a href="mailto:knowledgeblog-discuss@knowledgeblog.org">knowledgeblog-discuss</a>. </p>
<h1 id="a0000000003">3 Key Features</h1>
<p>At the moment, I haven’t fully explored all the features of LaTeX that are well supported. However, all the structural elements (sections, lists), bibliographies, links via the <a href="http://www.tug.org/applications/hyperref/manual.html">hyperref</a> package all seem to work well. </p>
<p>The math mode rendering works well. I’ve been using one famous equation: \(E=mc^2\), as my main test. But more complex examples work also. This is from <a href="http://www.mathjax.org">mathjax</a>:\(J_\alpha (x) = \sum _{m=0}^\infty \frac{(-1)^ m}{m! \,  \Gamma (m + \alpha + 1)}{\left({\frac{x}{2}}\right)}^{2 m + \alpha }\). </p>
<p>I’ve made a few tweaks to this also for common idioms. So the lesser than symbol is written in mathmode in latex but rendered directly in HTML: &lt;. </p>
<h1 id="sec:future">4 Future Work</h1>
<p>There are many things left to do yet. The process needs to made smooother, with a single tool to hook the current tool-chain together; it would be good to attach a PDF generated from the latex also. Currently, titles are set independently (which is why this post appears to have two titles). The mathjax plugin needs configuration options (it overwrites wp-latex functionality at the moment). And there is significant testing to do to see what advanced features (figures critically!) work and don’t work. Still, it’s good to see that most of the tools that I needed to get this work already existed. With luck, most of the other tools we need will be as good. </p>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1740 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2010/08/latex-to-wordpress/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The problem with institutional repositories</title>
		<link>http://www.russet.org.uk/blog/2010/08/the-problem-with-institutional-repositories/</link>
		<comments>http://www.russet.org.uk/blog/2010/08/the-problem-with-institutional-repositories/#comments</comments>
		<pubDate>Mon, 09 Aug 2010 11:09:50 +0000</pubDate>
		<dc:creator>Phil Lord</dc:creator>
				<category><![CDATA[Science]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1737</guid>
		<description><![CDATA[I don&#8217;t normally use my blog to engage in conversations the way that some people do. I already spend enough time on mailing lists, so using the blog seems redundant for this. However, I will change the habit of a life-time this once, because of an interesting discussion on institutional repositories, which I have previously [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1737">
<p>I don&#8217;t normally use my blog to engage in conversations the way that some people do. I already spend enough time on mailing lists, so using the blog seems redundant for this. However, I will change the habit of a life-time this once, because of an interesting discussion on <a href="http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=2527">institutional repositories</a>, which I have <a href="http://www.russet.org.uk/blog/2007/06/institutional-and-subject-archives/">previously</a> written about myself.</p>
<p>To me the difficulty with institutional repositories is this. First, they are a resource. Then, some one says, this is good, everyone should do this. Then, someone else says, hey this is great, we could use this for our RAE (REF, whatever) return.</p>
<p>Now, you have to deposit things in your IR. But people object, on various &#8220;data is mine&#8221; grounds, so perhaps they make the IR non-public. The data model gets tweaked with various additional data (which school, who your line manager is) necessary for RAE. At the same time, your co-authors also have to deposit into their IR. And, if you move, you have to type your entire back catalogue into various repositories for your new institution.</p>
<p>Currently I am supposed to deposit papers in various IRs, including at University and school level. As well as add bibliographic information to various databases. And, then of course, project wiki&#8217;s. And the funders want the information in various databases. All of which is very time consuming, produces highly duplicated, and often error-prone data. In short, it&#8217;s a bad thing.</p>
<p>The irony is, if you google for any of my papers, the main source from which they are scraped is my website. I set this up myself many years ago now; it&#8217;s a simple bibtex to HTML thing (actually not so simple nowadays&#8201;&#8212;&#8201;it grew over time). So, the simplest and most straight-forward solution, also turns out to be the best. The most important thing is this; the bibtex files are the ones that I use, for my own work, for citing myself (which, like any good scientist I do as often as possible even when the citation is largely <a href="http://www.russet.org.uk/blog/">irrelevant</a>). The website is what I use, when on the road to get the PDF of my own papers; if I want to give a reference to someone, I&#8217;ll email a link to my website. So, I keep it upto date, because it&#8217;s in my benefit to do so.</p>
<p>We need a few simple and easy to use standards for bibliographic data. It has to be simple, because it needs to fit in with peoples&#8217; current work practices; this means it needs to be supported by a heterogenous environment, by many different tools. And it&#8217;s won&#8217;t be, if the standard is hard to develop against.</p>
<p>For data, of course, the issues are somewhat different. Mostly because data needs more structure than human-readable information, and because the data is often large. However, two issues remain: first, we still need to fit with peoples working practices; second, with data, engaging in the institutional football we see with bibliographic data, will still be a bad thing.</p>
<p>Again, simple data standards are what we need. After that, people will choose whatever they choose; the data standard will be enough to bring it all together in the best way that we can.</p>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1737 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2010/08/the-problem-with-institutional-repositories/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A new grant for Knowledgeblog</title>
		<link>http://www.russet.org.uk/blog/2010/08/a-new-grant-for-knowledgeblog/</link>
		<comments>http://www.russet.org.uk/blog/2010/08/a-new-grant-for-knowledgeblog/#comments</comments>
		<pubDate>Mon, 02 Aug 2010 14:06:36 +0000</pubDate>
		<dc:creator>Phil Lord</dc:creator>
				<category><![CDATA[Grants]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1729</guid>
		<description><![CDATA[  I&#8217;m very pleased that our grant for knowledgeblog has been accepted by JISC. I shall follow the tradition that I set with my last post, of publishing all my primary scientific output on this blog. In this case, I&#8217;m using Word, which like the latex that I used last time isn&#8217;t perfect. Still improving [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1729">
<p>
 </p>
<p><span style="font-family:Arial">I&#8217;m very pleased that our grant for <a href="http://www.knowledgeblog.org">knowledgeblog</a> has been accepted by JISC. I shall follow the tradition that I set with my <a href="http://www.russet.org.uk/blog/2010/07/realism-and-science/">last</a> post, of publishing all my primary scientific output on this blog. In this case, I&#8217;m using Word, which like the latex that I used last time isn&#8217;t perfect. Still improving this process is part of the knowledgeblog proposal, so this post is also attacking a key deliverable for the grant!
</span></p>
<p>
</span></p>
<p><span style="font-family:Arial">The main content for this post is also available on the <a href="http://knowledgeblog.org/category/all">knowledgeblog</a> events blog.</p>
<p> </p>
<p><span style="font-family:Arial"><strong>Outline Project Description
</strong></span></p>
<h1><span style="font-family:Arial; font-size:11pt">The project extends existing blogging tools for use as a lightweight, semantically linked publication environment. This enables researchers to create a hub in the linked-data environment, that we call <em>knowledge</em> or <em>k-blogs</em>.  K-blogs are convenient and straight-forward for authors to use, integrating into researchers existing work practices and tools. The provide readers with distributed feedback and commenting mechanisms. We will support three communities (microarray, public health and workflow), providing immediate benefit, in addition to the long term benefit of the platform as a whole.  Additionally, this will enable a user-centric development approach, while showcasing the platform as the basis for next generation research publishing. 1. Introduction 
</span></h1>
<p><span style="font-family:Arial"><sup>1</sup>This document describes a proposal for a project within the JISC &#8220;Managing research Data&#8221; call. Data comes in many forms, from raw statistics, to highly structured databases, through to textual reports; natural language, although hard to search and manage, is still the richest form of representation; data in the form of reports and publications are the central hub around which all other data sit. This project, therefore, will provide a lightweight, yet extensible, framework for scientific publishing, incorporating a software-supported peer-review process. Bi-directional links will be maintained both between publications and to other forms of data, using semantic markup to enhance the meaning of these links. We will also customize this framework for three communities which, as well as being directly useful, will provide real-world requirements. The project will largely develop &#8220;glue&#8221; between existing, widely-used, open-source software systems, ensuring its sustainability and usefulness past the end of the funding.<br/>
		</span></p>
<h1><span style="font-family:Arial; font-size:11pt">2. Fit to Programme Objectives and Project Outline 
</span></h1>
<p><span style="font-family:Arial"><br/><sup>2</sup>The project call identifies the <strong>complexity</strong> and <strong>hybrid</strong> nature of the UK research data environment; despite this, one central focal point remains &#8212; most researchers spend considerable amounts of time discussing their data in the form of &#8220;paper&#8221; publications. For some, more theoretical disciplines, such as parts of computer science, the paper is the sole output; in others, such as biology, datasets are associated with papers and the <strong>barriers</strong> between &#8220;publication&#8221; and &#8220;data&#8221; are breaking down; most data sources in biology are rich in <strong><em>annotation</em></strong>; text that supports and explains the raw data. It is normally the annotation, not the raw data, which defines the quality of the resource. In these cases, <strong>text</strong> is an <strong>intrinsic</strong> part of the <strong>data</strong>. <br/><br/><sup>3</sup>However, the conventional publication process has changed relatively little; the adoption of web technologies have largely been used as a distribution mechanism. Publications are still <strong>expensive</strong> &#8212; either at subscription or publication time, depending on the business model of the publisher, and involve considerable, time-consuming interactions between author and publisher, often relating to display and presentation issues. This is in stark contrast to, for example, the biological data centres where both raw and annotated data are often made available <strong>within hours </strong>of their generation.<br/><br/><sup>4</sup>This situation is unfortunate because it limits the ability of researchers to customise their publication process for the requirements of their own discipline. As demonstrated by Shotton et al, and Rousay et al, it is possible to add considerable value, both enhancing the paper for the reader, as well as providing <strong>direct and semantically enhanced links</strong> to underlying data. The cost of the existing process, however, makes this form of publication unlikely for some data; for example, few scientists publish papers about negative results, resulting in an acknowledged publication bias<sup>,</sup>. As a result, it is <strong>hard</strong> for the semantically enhanced publication to take its place as the central hub for a <strong>linked data</strong> environment as envisioned by Coles and Frey, linking to and between research datasets, and the published knowledge about these datasets. <br/><br/><sup>5</sup>In the last decade, the blog has become a common, web-based publication framework. There are now numerous off-the-shelf tools and platforms for managing blogs, providing a high-degree of functionality. Many scientists blog about their work, about other published work (research blogging) or &#8220;live blog&#8221; about conferences and talks as they happen. In this case, the researcher is in-charge of their own publication environment, can extend it to their requirements, and publication happens immediately. However, the blog has not yet become a standard means of publication for <strong>primary research output</strong>.<br/><br/><sup>6</sup>Recently, as part of the EPSRC funded Ontogenesis network (ref), we trialled the <strong><em>Knowledge Blog</em></strong> process; in this case aimed at producing an educational resource describing many aspects of ontology development and usage, which might previously have been published in book form. We have shown that with this technology base, it is possible to replicate many of the features of the open peer-review, scientific book publication process; following two small meetings, we have written around 20 articles, and the website maintains around 1000 post reads per month (not simple hits!). To achieve this, we used only two features of the blog &#8212; trackbacks (bidirectional links) and categories (hierarchical keywords); although we used the WordPress blogging software, these features are supported by most other systems. We call these articles <strong><em>k-blogs</em></strong>.<br/><br/><sup>7</sup>Currently, however, the k-blog process is not fully supported with blog software alone, nor does it fully support the referencing, advanced linking and provenance needed specifically for research publications. For this project, we propose to provide extensions to support data-rich publications, deeply and semantically linked to other k-blogs and to other forms of data repository. Therefore, the project addresses the objectives and aims of the call through four main workpackages.<br/><br/>1) A documented <strong>k-blog process </strong>(WP1.1) describing different levels of  peer-review suitable for different forms of research data. An implementation (WP1.2), the <strong>k-blog platform</strong>, of these process based around open-source, off-the-shelf software.<br/><br/>2) Extensions to the k-blog platform supporting <strong>linking</strong>. This includes full support for referencing including COINS metadata on posts (WP2.1), client-side and permanently linked versions (WP2.2) and bidirectional links (WP2.3) to other data sets. We will add <strong>semantics</strong> to these links using the Citation Ontology (CiTO) (WP2.4).<br/><br/>3) Support for three specialist environments&#8212;<strong>healthcare</strong> (WP3.1), <strong>microarray</strong> (WP3.2) and <strong>workflows</strong> (WP3.3). All useful in their own right and showcasing the extensibility of the framework.<br/><br/>4) <strong>Documentation</strong> and <strong>tooling</strong> to integrate the k-blog process into scientists existing working practice and tooling; scientists will be able to publish from Word, OpenOffice, Google Docs or LaTeX (WP4.1). We will add tooling and documentation, as WP4.2, to support the use of reference management tools such as Endnote, Mendeley or Zotero, making use of deliverables from WP2.<br/>
		</span></p>
<h1><span style="font-family:Arial; font-size:11pt">3. Quality of proposal and Robustness of Workplan
</span></h1>
<p>
 </p>
<p>
<h2><span style="font-family:Arial">3.1 WP1: Knowledge Blog Process</span></h2>
<p><span style="font-family:Arial"><br/><br/><sup>8</sup>In this project, we aim to develop a light-weight publication framework, including the desirable aspects of the formal<strong> peer-review process</strong>. However, different forms of scientific publication require different levels of peer-review. For example, for http://ontogenesis.knowledgeblog.org, we require two reviews from an editorial board, assessing quality, appropriate for an educational resource. However, for http://process.knowledgeblog.org, which is intended to contain informal &#8220;how-to&#8221; and request for comment documents, a much lighter-weight, single editorial review assessing scope alone is more appropriate. Deliverable <strong>WP1.1</strong> will consist of <strong>documentation</strong> describing both formally and informally, a number of <strong>levels</strong> for the knowledge blog process, and how these can be achieved using a blog. These documents will, themselves, be published on http://process.knowledgeblog.org.<br/><br/><sup>9</sup>These processes will be <strong>implemented</strong> as Deliverable <strong>WP1.2</strong>, comprising <strong>freely available</strong> and widely used pieces of software, with additional &#8220;glue&#8221;. The basic publication framework will use WordPress 3 (WoP) &#8212; an open-source, multi-site, multi-author blogging system used to provide the hosted blog service at http://www.wordpress.com. While, we have found that WoP supports many aspects of this process, particularly from the readers perspective, a significant degree of &#8220;book-keeping&#8221; is required from authors, reviewers and editors. Readers know whether a paper has been reviewed or not, but authors have to remember for themselves who is reviewing the paper. Therefore, we will use a &#8220;ticket system&#8221;, specifically Request Tracker 3 (RT) (http://bestpractical.com/rt/). Both WoP and RT are <strong>extensible</strong> with plugins and will be extended and adapted to reflect the k-blog levels of WP1.1.<br/><br/><sup>10</sup>We will use this extensibility to provide a light-weight integration. RT operates as an email response system; by <strong>extending WoP</strong> to send <strong>email</strong> on submission of new papers, this can provide both an integration point, as well as the main point of interaction for authors, reviewers and editors. To provide editorial and reviewer functionality tickets can be moved between queues; extensions to RT will use standard blogging <strong>XML-RPC</strong> calls to feedback to WoP by, for example, re-categorising papers once accepted. OpenID (http://openid.net) will be used to integrate the user accounts between the two systems. WoP already supports this fully, while RT supports it in skeleton form.<br/><br/><sup>11</sup>Although we will provide an implementation of the <strong>k-blog</strong> process, it will be described sufficiently generically to support complete and independent implementation. 
</span></p>
<p>
 </p>
<p><em>3.2 </em><strong>WP2: References and Metadata</strong><br/><sup>12</sup>For k-blogs to become an integral part of the scientific record, they must fully support the semantic and linked data environment. Although WoP supports standard <strong>URI based linking</strong> to resources, and bidirectional &#8220;trackback&#8221; linking to other resources, it lacks complete functionality suitable for research communities. This is a rare example of functionality that is not already provided by WoP or an associated plugin. Deliverable <strong>WP2.1</strong> will fulfil this need; we will support the insertion of at least <strong>DOI</strong>s and <strong>PubMed ID</strong>s (PMID), that will be resolved to full human-readable reference lists for display, using APIs provided by CrossRef and NCBI eUtils respectively. To fully support computational agents wishing to access the same information, references will also support <strong>COinS</strong> metadata, embedded into the display HTML. 
</p>
<p><span style="font-family:Arial">K-blog posts will also require outward facing metadata, that describe the resources they provide in a standards-compliant manner. The Open Archives Initiative (OAI) provide standards that aim to facilitate the efficient dissemination of content. Specifically, the Object Reuse and Exchange specification (<strong>OAI-ORE</strong>) is a standard for the description and exchange of compound digital objects  (such as a WoP post or page). The WordPress OAI-ORE plugin provides link header elements that implement this specification.<br/><br/><sup>13</sup>Our initial investigations into the k-blog process showed that WoP support for versioning and provenance are lacking; the k-blog process involves updating papers after submission but before final acceptance. While WoP stores all these <strong>versions</strong>, these are only currently visible by authors or editors through the administration interface. Whilst existing plugins for WoP already provide some of this functionality, Deliverable<strong> WP2.2</strong> will uncover these to readers, along with a defined permalink scheme for access to all versions, providing full <strong>provenance</strong>. <br/><br/><sup>14</sup>WoP supports <strong>bi-directional</strong> links in the form of trackbacks; this is mediated by XML-RPC calls between resources when a link is made. This will support linking to data where, for example, the data is another <strong>k-blog</strong>; however, general data resources may lack support for this process. Therefore, as Deliverable<strong> WP2.3</strong>, we will provide a trackback proxy, hosted on the http://knowledgeblog.org server, storing and presenting these links for resources  that cannot directly  process trackbacks.<br/><br/><sup>15</sup>To complete this work package, we will add semantics to the links using CiTO, as Deliverable <strong>WP2.4</strong>. Therefore, as well as enabling easier data linking and provenance, we will also enable addition of meaning to these links.
</span></p>
<p>
 </p>
<p>
<h2><span style="font-family:Arial">3.3 WP3 &#8211; Specialist Environments</span></h2>
<p><span style="font-family:Arial"><strong><br/><br/></strong><sup>16</sup>The k-blog platform and process is designed to be flexible and adaptable to the needs of specialist environments. We will use three main use cases to ensure <strong>real world</strong> applicability of the software, as well as <strong>fulfilling</strong> the immediate <strong>needs</strong> of these communities.<br/><br/><sup>17</sup>For Deliverable <strong>WP3.1</strong>, we will add additional features for supporting the microarray community. Currently, the microarray community is well serviced in terms of <strong>metadata</strong> capture (MIAME) and <strong>deposition</strong> in public repositories (ArrayExpress, GEO). As part of WP2, we will support <strong>linking</strong> to these datasets through stable URIs. However, these resources deal only with data generation. Post-processing and analysis is largely captured at the publication stage, often in supplementary material.<br/><br/><sup>18</sup>A substantial amount of this analysis uses BioConductor: a widely used, open-source platform for statistical microarray analysis based on the R statistical programming language. We will extend k-blog with <strong>specific support for R</strong> and BioConductor. Authors will be able to directly embed code into k-blog papers, along with the figures that result; as a result reviewers and readers will be able to see a <strong>computationally precise description </strong>of methods and replicate the generation of figures should they choose.<br/><br/><sup>19</sup>Finally, we will investigate the possibility of publication to a k-blog using only R code and references to public databases, in a process similar to Sweave &#8212; figures will be generated on the server, provide guarantees of correctness and precise provenance. The limited scope of this call means this part of WP3.1 will be proof-of-principle only.<br/><br/><sup>20</sup>For <strong>WP3.2</strong>, we will focus on the <strong>public health community</strong> (PHC): a key workforce in delivering quality and effective healthcare by providing timely and accurate public health intelligence (PHI)<sup>,</sup>.  PHI is a varied environment performing statistical analyses: producing information figures, diagrams and reports to communicate results to the wider health community.  However, the PHC operates in small groups with little knowledge networking.  The main aim of the k-blog is to improve the availability of health information, data and knowledge, to inform decisions for health protection and care standards as supported by the Quality Improvement Productivity and Prevention initiative.  The NWeHealth <em>e-Lab</em> project, hosted at The University of Manchester, provides an environment to bring together <em>research objects</em> into a single location. As elsewhere, textual data forms the key hub that links together all the other forms of knowledge. By <strong>linking to e-Lab</strong>
			<em>research objects</em> from a k-blog, this link will be made explicit, available, interpretable and directly valuable to the PHC; as a result WP3.2 is synergistic with the rest of the proposal.  This community also bring a set of access control requirements. To support these we will use existing WoP facilities, providing a simple, easy-to-use three level access model.
</span></p>
<p>
 </p>
<p><span style="font-family:Arial"><sup>20</sup>For WP3.3, we will generate k-blog <strong>content</strong> about <strong>Taverna</strong> workflows and methods for building them. Workflows have become a popular way of realizing computational analyses and have become an important form of <strong>data</strong>. The <strong>JISC funded myExperiment</strong> project is widely used to disseminate the workflows themselves. Knowledge about issues surrounding workflows is, however, more difficult to produce and disseminate. A k-blog, with its ability to produce short, targeted articles as the need arises and the resources become available for writing, suits the need for taverna workflow documentation. We will seek k-blogs on Taverna issues such as: the basics of workflow design; how to choose among a set of similar services in producing a workflow; and, the testing of workflows. We will implement a light-weight mechanism, using <strong>trackbacks</strong>, to link between the k-blog and myExperiment. 
</span></p>
<p>
 </p>
<p><span style="font-family:Arial"><sup>21</sup>As part of <strong>WP3</strong>, we will also hold four workshops, at 3-month intervals, each focusing on one particular k-blog and community. These workshops will be of the form previously trialled as part of the Ontogenesis network, and will serve several purposes; requirements gathering and feedback for us, education for the community and development of content, that demonstrates the process to the general readership.  
</span></p>
<p>
 </p>
<p>
<h2><span style="font-family:Arial">3.4 WP4 &#8211; Integration with Existing Working Practices</span></h2>
<p><span style="font-family:Arial"><br/><br/><sup>22</sup>For the k-blog process to be <strong>acceptable</strong> to <strong>communities</strong> such as those described in WP3, it must fit with existing working practices. Researchers mostly write documents using a word-processor. Fortunately, as the <strong>k-blog</strong> platform is based on the <strong>widely-used</strong> WoP, which in turns offers a <strong>widely-supported</strong> API, this style of working can be readily integrated. It is already possible to author using Word (2007 onward), OpenOffice, Google Docs and LaTeX using integrated or existing technologies, as demonstrated by our previous work at http://ontogenesis.knowledgeblog.org. For Deliverable <strong>WP4.1</strong>, user oriented documentation, describing these tools will be developed. This documentation will also describe clearly how to present and organise papers in a way which is optimized for the <strong>k-blog</strong> process. While, we expect this documentation to take a significant time-span to produce, refining it as a result of user feedback, it is important to note that a k-blog is already <strong>useful</strong> and <strong>possible</strong>.
</span></p>
<p style="background: white"><span style="color:black; font-family:Arial">To take maximal advantage of linking technologies developed in WP2, we will need to integrate with existing technologies for referencing. As deliverable <strong>WP4.2</strong>, we will add tooling to enable the use of bibliographic tools such as Endnote, Mendeley, Zotero or BiBTeX to insert references that <strong>k-blog</strong> can directly translate. Largely, this should consist of &#8220;styles&#8221;, modifying the in-text citation, as the reference plugin of <strong>WP2.1</strong> will generate reference lists. As with other deliverables, this tooling will include substantial documentation, developed using the <strong>k-blog</strong> process. 
</span></p>
<h2><span style="font-family:Arial; font-size:11pt">4. Project Timeline
</span></h2>
<p style="background: white">
 </p>
<div>
<table style="border-collapse:collapse" border="0">
<colgroup>
<col style="width:91px"/>
<col style="width:90px"/>
<col style="width:94px"/>
<col style="width:66px"/>
<col style="width:302px"/></colgroup>
<tbody valign="top">
<tr style="height: 20px">
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  solid 0.5pt; border-left:  solid 0.5pt; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial"><strong>Name</strong></span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  solid 0.5pt; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial"><strong>Start</strong></span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  solid 0.5pt; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial"><strong>End</strong></span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  solid 0.5pt; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial"><strong>Staff</strong></span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  solid 0.5pt; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial"><strong>Notes</strong></span></p>
</td>
</tr>
<tr style="height: 20px">
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid 0.5pt; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">    WP 1</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">02/08/2010</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">30/10/2010</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt"> </td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt"> </td>
</tr>
<tr style="height: 20px">
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid 0.5pt; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">      WP 1.1</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">02/08/2010</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">31/08/2010</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">All</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">A documented k-blog process</span></p>
</td>
</tr>
<tr style="height: 20px">
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid 0.5pt; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">      WP 1.2</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">01/09/2010</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">30/10/2010</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">DS,SC</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">Implementation with off-the-shelf software</span></p>
</td>
</tr>
<tr style="height: 20px">
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid 0.5pt; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">    WP 2</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">01/11/2010</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">30/04/2011</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt"> </td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt"> </td>
</tr>
<tr style="height: 20px">
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid 0.5pt; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">      WP 2.1</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">01/11/2010</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">26/02/2011</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">SC</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">COinS metadata on posts</span></p>
</td>
</tr>
<tr style="height: 20px">
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid 0.5pt; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">      WP 2.2</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">01/11/2010</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">29/01/2011</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">SC</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">Client-side, permanently linked versions</span></p>
</td>
</tr>
<tr style="height: 20px">
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid 0.5pt; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">      WP 2.3</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">03/01/2011</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">26/02/2011</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">DS</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">Bi-directional links to other datasets</span></p>
</td>
</tr>
<tr style="height: 20px">
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid 0.5pt; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">      WP 2.4</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">01/03/2011</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">30/04/2011</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">PL</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">Semantic linking with CITO</span></p>
</td>
</tr>
<tr style="height: 20px">
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid 0.5pt; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">    WP 3</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">01/11/2010</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">30/07/2011</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt"> </td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt"> </td>
</tr>
<tr style="height: 20px">
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid 0.5pt; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">      WP 3.1</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">01/11/2010</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">30/07/2011</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">GM</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">Specialist environment – Healthcare</span></p>
</td>
</tr>
<tr style="height: 20px">
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid 0.5pt; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">      WP 3.2</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">01/11/2010</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">30/07/2011</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">DS</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">Specialist environment &#8211; Microarrays</span></p>
</td>
</tr>
<tr style="height: 20px">
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid 0.5pt; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">      WP 3.3</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">01/11/2010</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">30/07/2011</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">RS</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">Specialist environment – Workflows</span></p>
</td>
</tr>
<tr style="height: 20px">
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid 0.5pt; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">    WP 4</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">02/08/2010</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">30/06/2011</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt"> </td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt"> </td>
</tr>
<tr style="height: 20px">
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid 0.5pt; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">      WP 4.1</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">02/08/2010</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">30/04/2011</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">GM,DS</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">Authoring documentation and tools</span></p>
</td>
</tr>
<tr style="height: 20px">
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  solid 0.5pt; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">      WP 4.2</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">02/05/2011</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p style="text-align: right"><span style="font-family:Arial">30/06/2011</span></p>
</td>
<td style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">GM,SC</span></p>
</td>
<td vAlign="bottom" style="padding-left: 7px; padding-right: 7px; border-top:  none; border-left:  none; border-bottom:  solid 0.5pt; border-right:  solid 0.5pt">
<p><span style="font-family:Arial">Referencing documentation and tools</span></p>
</td>
</tr>
</tbody>
</table>
</div>
<p style="background: white">
 </p>
<h2><span style="font-family:Arial; font-size:11pt">5. Project Management Arrangements
</span></h2>
<p><span style="font-family:Arial"><sup>23</sup>The project will be managed from Newcastle University; the <strong>primary management</strong> will be from Dr Lord who will be responsible for:
</span></p>
<ul>
<li>
<div style="background: white"><span style="font-family:Arial">Developing Project Management Plans;
</span></div>
</li>
<li>
<div style="background: white"><span style="font-family:Arial">Ensuring that the Project technical objectives are met;
</span></div>
</li>
<li>
<div style="background: white"><span style="font-family:Arial">Prioritising and reconciling conflicting opportunities;
</span></div>
</li>
<li>
<div style="background: white"><span style="font-family:Arial">Reporting and collaborating with JISC programme Manager;
</span></div>
</li>
<li>
<div style="background: white"><span style="font-family:Arial">Dissemination of the k-blog platform.
</span></div>
</li>
</ul>
<p><span style="font-family:Arial">Project progress will be evaluated through <strong>scheduled</strong>, short, &#8220;<strong>stand-up</strong>&#8221; meetings on a weekly basis, conducted face-to-face, via skype or phone as appropriate. Although most project staff are co-located, primary <strong>unscheduled</strong> communication will be via <strong>public mailing list</strong>, ensuring maximum visibility and openness.  <strong>User consultation</strong> will be via <strong>public mailing list</strong>, as well as through a &#8220;<strong>dogfooding</strong>&#8221; k-blog.  All project staff have been handpicked; they are highly experienced and self-directed, as outlined elsewhere. All are associated with several other projects and duties (research, research support, teaching and training), and are responsible for managing these independent workloads.  
</span></p>
<p>
 </p>
<h2><span style="font-family:Arial; font-size:11pt">5.1 Risks
</span></h2>
<p><span style="font-family:Arial"><sup>24</sup>Staff Risk – as with all projects, loss of staff could negatively impact on this project; however, all staff are on permanent contracts, have long histories in research, so this is less likely. Additionally, by dividing the work between five individuals, we limit the risk should a single person leave. 
</span></p>
<p><span style="font-family:Arial">WoP3 and other dependencies – the project depends on other software, most notably WoP for which a new version (3.0) is now in beta; however the software is widely supported. Other software is replaceable. 
</span></p>
<p><span style="font-family:Arial">Standards Shifting – the project depends on a number of standards and these may change. In this project, we will <strong>NOT </strong>support standards, but rather use those that support us. Where standard change rapidly, their implementation will be delayed (till they stabilize) or dropped. None of the standards described here is critical to the success of the project. <strong>
			</strong></span></p>
<p>
 </p>
<h2><span style="font-family:Arial; font-size:11pt">5.2 IPR Position
</span></h2>
<p><span style="font-family:Arial"><sup>25</sup>All code will be developed under open source licences. WoP and RT are licensed under GPL, so code linking to these will be likewise licensed. Code that is separable will be released under LGPL. Code will remain copyright of respective institutions or authors. Any documentation produced by project staff relating to the project will be licensed under Creative Commons Attribution license. Licensing of individual k-blogs will be delegated, but permissive licenses will be encouraged. 
</span></p>
<p>
 </p>
<h2><span style="font-family:Arial; font-size:11pt">5.3 Sustainability
</span></h2>
<p><span style="font-family:Arial"><sup>26</sup>This project is largely based around innovative, novel and leading <strong>use of existing</strong> software.  As such the sustainability of the majority of the technology base is not dependent on project members but large companies with established and proven business models. The <strong>k-blog</strong> process will be cleanly separated from its implementation, ensuring only weak dependencies to underlying software. Where, we produce software &#8220;glue&#8221;, public and widely supported APIs will be used where possible. This will ensure that components are replaceable. All code, including historical versions will be publicly available. Documents produced by project staff will be publically available and clearly licensed so will be archived through the internet &#8220;cloud&#8221; resources; we are also seeking explicit support for archiving from the British Library. 
</span></p>
<p>
 </p>
<h2><span style="font-family:Arial; font-size:11pt">5.4 Staff Recruitment
</span></h2>
<p><span style="font-family:Arial"><sup>27</sup>All staff are already in post. 
</span></p>
<p>
 </p>
<h2><span style="font-family:Arial; font-size:11pt">5.5 Key Beneficiaries
</span></h2>
<p><span style="font-family:Arial"><sup>28</sup>Our key beneficiaries are the <strong>public health</strong>, <strong>microarray</strong> and <strong>workflow</strong> communities; as the k-blog process is based around commodity software, these groups can use the <strong>basic </strong>environment from the first day of the project to generate and share content. As the project progresses, so will the process, the software to support it and the documentation to explain it; at all stages, the k-blog process fulfils a <strong>clear and immediate need</strong>. While we are specifically targeting these communities, the k-blog process and platform is sufficiently <strong>generic</strong> that it can support a <strong>wide range</strong> of research activities.
</span></p>
<p><span style="font-family:Arial">Although presented here as a single platform, the process and components are <strong>separable</strong> and can benefit communities independently. In particular, the tools and documentation from WP2 and WP4 will find use within the research blogging community, who find, in particular, the lack of tooling for referencing difficult. Finally, the statement of a peer-review process, and its implementation within RT will be applicable to any peer-review environment regardless of the form of publication. This includes publications published using wiki or other Content Management Systems. 
</span></p>
<p>
 </p>
<h2><span style="font-family:Arial; font-size:11pt">5.6 Engagement with Community
</span></h2>
<p><span style="font-family:Arial"><sup>29</sup>We consider the mechanism for engagement with four kinds of community: engagement with our core <strong>content generating</strong> community is an intrinsic part of this proposal, as described in <strong>WP3</strong>.  Further interaction with more disparate groups will be maintained through personal contacts; each of the five individuals named in this proposal are experienced and embedded in different communities (health care, microarray, ontology, proteomics). Engagement with our core <strong>content consuming</strong> community is, again, an intrinsic part of the proposal; all project communications will be via open mailing list or k-blog. Project members are active users of Web 2.0 social technologies; our initial trials as part of Ontogenesis showing this approach to be highly effective form of dissemination, with minimal effort. Engagement with <strong>software users</strong> will be via website and direct interaction. All software will be released or advertised via normal channels (website, versioning, and mailing list), including a (debian) package repository for those wishing to set up their own server.  Finally, <strong>developer communities</strong> will not be specifically targeted, but our open source, continually integrated development plan will be attractive, and we will accept suitably licensed contributions.  
</span></p>
<p><span style="font-family:Arial"><sup>30</sup>All communities will benefit from the open and agile development methodology we will adopt; changes to the environment will be integrated and released rapidly, ensuring continual improvement and facilitating rapid feedback cycles. 
</span></p>
<p>
 </p>
<h2><span style="font-family:Arial; font-size:11pt">6. Previous Experience and Project Team
</span></h2>
<p>
 </p>
<p><span style="font-family:Arial"><sup>31</sup><strong>Dr. Phillip Lord</strong> is a Lecturer of Computing Science at Newcastle University. He has a PhD in yeast genetics from University of Edinburgh, after which he moved into bioinformatics. He is well known for his work on ontologies in biology, as well as his contributions to eScience beginning with his role as a RA on the myGrid project. Since his move to Newcastle, he has been an investigator on there more eScience projects; CARMEN, ONDEX and InstantSOAP, as well as maintaining an active engagement in standards development (OBI, MIGS, MIBBI), and publishing on the fundamentals of ontology design. He was an active participant in the Ontogenesis network, and developed the initial idea for knowledge blogs as part of this. He is an active blogger and developer.
</span></p>
<p>
 </p>
<p><span style="font-family:Arial"><sup>32</sup><strong>Dr. Georgina Moulton</strong> is an Education and Development Fellow at The University of Manchester.  Since 2005 her main roles have been to co-ordinate the development, and delivery of multi-disciplinary bio/health informatics education programmes; and to facilitate the engagement of biological and health communities in a variety of bio and health informatics research projects (<em>e.g.,</em> ONDEX, Obesity e-Lab).  For 3 years, Georgina was the EPSRC funded Ontogenesis Network Manager, in which she co-ordinated the activities of the network and expanded the network through the facilitation of the development of new activities and was involved in the trial k-blog process.  More recently her work includes the development and delivery in conjunction with NHS partners of an education and development programme tailored to match the needs of North West public health analysts and the wider healthcare workforce.  
</span></p>
<p>
 </p>
<p><span style="font-family:Arial"><sup>33</sup><strong>Dr. Daniel Swan</strong> has a PhD in developmental biology and continued to work in developmental biology as a post-doctoral researcher before moving into bioinformatics in 2001.  Subsequent positions included working for Bart&#8217;s and the London Genome Centre and the Centre for Hydrology and Ecology in informatics driven roles dealing with large, distributed biological datasets generated by large user communities.  Currently the manager of the Newcastle University Bioinformatics Support Unit, he leads a small team aiding biological researchers generate, capture, store and analyse their digital data.  His interdisciplinary background means he has grounding in both computer and biological sciences and is comfortable working on CS focused projects (CARMEN, InstantSOAP, Bio-Linux) as well as acting in a research capacity analysing high-throughput data. 
</span></p>
<p>
 </p>
<p><span style="font-family:Arial"><sup>34</sup><strong>Dr. Simon Cockell</strong> has a PhD in Genetics from Leicester University, and refocussed into Bioinformatics with a Masters degree from Leeds in 2005. From there he moved to Newcastle, and the Bioinformatics Support Unit. Since coming to Newcastle, Simon has worked on a range of projects involving large scale analyses (AptaMEMS-ID), data integration (Ondex) and health informatics (MRC Mitochondrial Disease Cohort). 
</span></p>
<p>
 </p>
<p><span style="font-family:Arial"><sup>35</sup><strong>Dr Robert Stevens </strong>is a senior lecturer in Bioinformatics in the Bio and Health Informatics group at the University of Manchester. His main areas of research are in the development and use of semantics within the life sciences. This is blended with the use of e-Science platforms to gather and manage the data and knowledge of the life sciences. He was PI on the Ontogenesis network that ran the meetings for the first k-blog. He is or has been a co-investigator on the myGrid and myExperiment grants that will provide both content and technical input to this project. As well as the JISC funded myExperiment project, Stevens was an investigator on the JISC funded CO-ODE project that developed Protégé 4. On the back of this, Stevens has led the OWL training activities at Manchester that has directly fed in to the Ontogenesis k-blog. This range of experience makes Stevens an ideal partner to lead the development of content within this project.
</span></p>
<p><span style="font-family:Arial">
		</span> </p>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1729 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2010/08/a-new-grant-for-knowledgeblog/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Realism and Science</title>
		<link>http://www.russet.org.uk/blog/2010/07/realism-and-science/</link>
		<comments>http://www.russet.org.uk/blog/2010/07/realism-and-science/#comments</comments>
		<pubDate>Wed, 28 Jul 2010 15:28:42 +0000</pubDate>
		<dc:creator>Phil Lord</dc:creator>
				<category><![CDATA[Ontology]]></category>
		<category><![CDATA[Papers]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1713</guid>
		<description><![CDATA[This post carries the text of a paper accepted for PLoS One (now published). I publish it here as a pre-print because of the recent discussion on OBO discuss about realism. I have converted this from the original latex, which isn&#8217;t perfect. Apologies for errors. The [PDF] is available here. Adding a little reality to [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1713">
<p>This post carries the text of a paper accepted for PLoS One (now <a href="http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0012258">published</a>). I publish it here
as a pre-print because of the recent discussion on OBO discuss about realism.
I have converted this from the original latex, which isn&#8217;t perfect. Apologies
for errors.</p>
<p>The <a href="http://homepages.cs.ncl.ac.uk/phillip.lord/download/publications/realism_and_science.pdf">[PDF]</a> 
is available here. </p>
<div>
<p><big class="xlarge"><b class="bfseries">Adding a little reality to
  building ontologies for biology</b><br /></big> Phillip Lord and Robert
  Stevens<br /> School of Computing Science<br /> Claremont Road<br />
  Newcastle University<br /> Newcastle-upon-Tyne, UK<br />
  <a href="phillip.lord@newcastle.ac.uk">phillip.lord@newcastle.ac.uk</a><br />
  School of Computer Science<br /> The University of Manchester<br /> Oxford
  Road<br /> Manchester, UK<br />
  <a href="robert.stevens@manchester.ac.uk">robert.stevens@manchester.ac.uk</a></p>
</div>
<h1 id="a0000000002">Abstract</h1>
<p><b class="bfseries">Background:</b> Many areas of biology are open to
mathematical and computational modelling. The application of discrete, logical
formalisms defines the field of biomedical ontologies. Ontologies have been
put to many uses in bioinformatics. The most widespread is for description of
entities about which data have been collected, allowing integration and
analysis across multiple resources. There are now over 60 ontologies in active
use, increasingly developed as large, international collaborations.</p>
<p>There are, however, many opinions on how ontologies should be authored;
that is, what is appropriate for representation. Recently, a common opinion
has been the &ldquo;realist&rdquo; approach that places restrictions upon the
style of modelling considered to be appropriate.</p>
<p><b class="bfseries">Methodology/Principle Findings:</b> Here, we use a
number of case studies for describing the results of biological experiments.
We investigate the ways in which these could be represented using both realist
and non-realist approaches; we consider the limitations and advantages of each
of these models.</p>
<p><b class="bfseries">Conclusions/Significance:</b> From our analysis, we
conclude that while realist principles may enable straight-forward modelling
for some topics, there are crucial aspects of science and the phenomena it
studies that do not fit into this approach; realism appears to be
over-simplistic which, perversely, results in overly complex ontological
models. We suggest that it is impossible to avoid compromise in modelling
ontology; a clearer understanding of these compromises will better enable
appropriate modelling, fulfilling the many needs for discrete mathematical
models within computational biology.</p>
<h1 id="a0000000003">Introduction</h1>
<p>Ontologies are now widely used for describing and enhancing biological
resources and biological data, largely following on from the success of the
Gene
Ontology&nbsp;<span class="cite">[<a href="#Ashburner2000">1</a>]</span>.
Ontologies have been used for many purposes, from schema integration to value
reconcilliation to query
interfaces&nbsp;<span class="cite">[<a href="#handbook2">2</a>]</span>.
Ontologies have also become a cornerstone of computational biology and
bioinformatics. As computationally amenable artifacts they are, themselves, a
direct part of computational biology; many computational biologists are
involved in their production and maintenance. Many more use ontologies to
summarise their data, often by looking for
over-representation&nbsp;<span class="cite">[<a href="#Zeeberg2003">3</a>]</span>,
as the basis for drawing computational inferences about
data&nbsp;<span class="cite">[<a href="#Wolstencroft2006">4</a>]</span>,
or as the basis for determining semantic
similarity&nbsp;<span class="cite">[<a href="#Lord2003">5</a>]</span>.
Even those not making direct computational use of ontologies are likely to
come into contact with them, for example, when preparing annotation as part of
their data
release&nbsp;<span class="cite">[<a href="#Whetzel2006a">6</a>]</span>.</p>
<p>It is, therefore, of vital interest to computational biologists that
ontologies for use within biomedicine are fit for purpose. One effort that
aims to increase the quality of the ontologies available within biomedicine is
the &ldquo;OBO
Foundry&rdquo;&nbsp;<span class="cite">[<a href="#Smith2007">7</a>]</span>.
The main tool that it uses for this is &ldquo;an evolving set of shared
principles governing ontology development&rdquo;. The initial eleven
principles of the OBO
Foundry&nbsp;<span class="cite">[<a href="#OBOFoundry2006">8</a>]</span>
were largely concerned with what might be termed &lsquo;good engineering
practice&rsquo; (ontologies must, for example, be openly available, with a
common syntax, well documented, and used). These principles have later been
joined by a further
eleven&nbsp;<span class="cite">[<a href="#OBOFoundry2008">9</a>]</span>;
these include principles such as &ldquo;textual definitions will use the
genus-species form&rdquo;, &ldquo;Use of Basic Formal Ontology&rdquo; and, the
somewhat quixotic, &ldquo;terms [&hellip;] should correspond to instances in
reality&rdquo;. These stem not from engineering practice, but from a
perspective called <i class="itshape">realism</i>.</p>
<p>The many different uses for ontologies that we have described are reflected
in different understandings and methodologies about how and what to represent
in an ontology. Over the last few years, for many uses the paradigm has moved
from &ldquo;a conceptualization of the application domain&rdquo; toward
&ldquo;a description of the key entities in reality&rdquo;; it is this latter
approach that defines
realism&nbsp;<span class="cite">[<a href="#Johansson2006">10</a>]</span>.
This approach to ontology is typified by the Basic Formal Ontology (BFO); a
small upper-ontology for use within science in general and biomedical ontology
building in
particular&nbsp;<span class="cite">[<a href="#Grenon2004">11</a>]</span>.</p>
<p>There has been significant discussion regarding the possibility of
representing <em>only</em> &ldquo;real entities&rdquo; in computational
ontologies&nbsp;<span class="cite">[<a href="#smith2004beyond">12</a>]</span>.
Likewise, there has been significant discussion about the philosophy
surrounding realism and the role of ontology in its
representation&nbsp;<span class="cite">[<a href="#Johansson2006">10</a>]</span>.
While it is argued by some that it is possible to represent <em>only</em>
reality when making a domain description, there has, however, been little
discussion on whether it is necessarily desirable to do so.</p>
<p>In this paper, we consider the implications that realism has for the
choices that are open to the ontologist while they are modelling their domain
of interest. In particular, we consider the implications that this has for the
computational capabilities of any resultant ontology, in terms of its ability
to represent scientific knowledge in a computationally amenable form, as well
as the ability to perform automated inference or statistics over this
knowledge. We suggest that the application of realism results in ontologies
that are over-complex, awkward or limited; as such, realism falls far short of
its aim of increasing the fitness-for-purpose of ontologies. This approach,
therefore, is unlikely to fulfil the needs of computational biologists whom
form a substantial part of both the user and developer community for
bio-ontologies.</p>
<h1 id="a0000000004">Methods</h1>
<p>In this paper, we take the approach of a number of worked exemplars; this
is a complementary approach to an in-depth consideration of the modelling
decisions for a particular area or particular ontology, which we have used
previously&nbsp;<span class="cite">[<a href="#Lord2009">13</a>]</span>,
as it allows broader conclusions about the general principles of ontology
development. For each section, as well as the main exemplars, a number of
related examples are briefly discussed, to reinforce that the issues raised
are, indeed, general.</p>
<p>The exemplars have been selected by several criteria. First, all the main
exemplars are all taken from within biomedicine; this is also true for the
majority of the related examples. Second, we have chosen exemplars that
provide as wide a coverage of biology as possible. For practical reasons,
third, we have chosen exemplars where the underlying science is relatively
basic to much of biology and is likely to be immediately clear to the reader
without significant explanation.</p>
<p>We have chosen exemplars requiring as little knowledge of specific
ontologies as possible. We refer to only three. The first is BFO (see
&ldquo;sec:what-realism-2&rdquo;) which is a canonical example of a realist
ontology. BFO is described as a cross-domain, upper-ontology; as a result,
most terms fail the criteria given above; they are of poor biomedical
relevance, and are not basic science or immediately clear. We have, therefore,
also used PATO
(see <a href="http://obofoundry.org/wiki/index.php/PATO:Main_Page">http://obofoundry.org/wiki/index.php/PATO:Main_Page</a>);
this defines &ldquo;qualities&rdquo; that we might consider attributes of
other entities; so, the authors of this paper have a height, weight and shape,
all of which are considered to be qualities of the authors. Finally, we use
the relationship
ontology&nbsp;<span class="cite">[<a href="#Smith2005">14</a>]</span>;
this describes the relations between entities. So, for example, the height of
the author <em>inheres_in</em> the author.</p>
<p>As discussed in this and other
works&nbsp;<span class="cite">[<a href="#Russell1946">15</a>, <a href="#Merrill2010">16</a>]</span>,
&ldquo;realism&rdquo; is itself poorly defined. Where this lack of definition
makes the consequences of realism hard to determine, we have taken the
practical course, of showing the consequences as they play out in practice; to
an extent, therefore, these three ontologies are not only exemplars for
realism, but define it, as it is currently practiced. In short, for this
paper, when we say &ldquo;realism&rdquo;, we largely mean &ldquo;realism as
practiced by BFO&rdquo;. We do not claim, in this paper, to address all the
philosophical perspectives that through time carried the name
&ldquo;realism&rdquo;.</p>
<h1 id="a0000000005">Results</h1>
<h2 id="a0000000006">What is Realism?</h2>
<p>Building ontologies based on reality is obviously appealing to most
scientists; after all the study of <em>reality</em> to determine its behaviour
and laws is the goal of scientists. A brief consideration, however, shows that
this notion cannot define a methodology for the building of ontologies.</p>
<p>Within the context of science &ldquo;reality&rdquo; would normally be taken
to mean our experimental or observational data; but the statement that science
(ontologies) should be based on experimental or observational data is a truism
and, as such, has no explanatory power. The &ldquo;real&rdquo; in realism
refers, in fact, to the belief that the categories that we can use to divide
entities are, themselves, real.</p>
<p>This distinction stems from an old argument from philosophy; realism
against conceptualism. Again, both sides of the argument agree that the world
we can percieve, and as scientists, experiment on, is mind-independent. The
conceptualist, however, argues that the categories that they
term <em>concepts</em> are a product of social agreement. Conversely, the
realist argues that these categories that they term <em>universals</em> are
themselves real, that is mind independent in their own right, like the
entities they describe.</p>
<p>This distinction may seem fairly confusing; as
Russell&nbsp;<span class="cite">[<a href="#Russell1946">15</a>]</span>
says &ldquo;if I have failed to make Aristotle&rsquo;s theory of universals
clear, that is (I maintain) because it is not clear&rdquo;. In fact, there is
a third possibility that is a more empirical view&mdash;that is, if categories
(or other models) help in describing and predicting experimental data, then
they are useful regardless of whether they are real or
otherwise&nbsp;<span class="cite">[<a href="#Dumontier2010">17</a>]</span>.
As an example, the Mendelian notion of segregating units of inheritance was
defined and useful many years before a complete mechanistic description of
their cause was available. In this context, we note that there is no commonly
used term to express this form of category; most commonly,
&ldquo;concept&rdquo; is used.</p>
<p>For a field with a core activity of providing definitions, there is
surprisingly little agreement on the meaning of the word
&ldquo;ontology&rdquo;; as there have been many papers on the topic, we
consider just a few that reflect the distinction between these approaches.
Probably the most commonly cited
definition&nbsp;<span class="cite">[<a href="#Gruber1992">18</a>]</span>
describes an ontology as &ldquo;a specification of a conceptualization&rdquo;.
This definition emphasises the formality (i.e. logical and, therefore,
computationally amenable) aspect to ontology development.</p>
<p>This is countered with a realist definition; while the requirements from
Gruber&rsquo;s definition&mdash;a formal specification&mdash;are necessary,
realist ontologies add the requirement that &ldquo;the nodes and edges
correspond not to concepts but, rather, to entities in
reality&rdquo;&nbsp;<span class="cite">[<a href="#Ceusters2006">19</a>]</span>.</p>
<p>What does&ldquo;reality&rdquo; in this context actually mean? Definitions
such as &ldquo;that which exists&rdquo; are strangely circular leaving the
question of what &ldquo;exists&rdquo; means.
Smith&nbsp;<span class="cite">[<a href="#smith2004beyond">12</a>]</span>
adds the priviso that reality is &ldquo;captured in scientific laws&rdquo;.
Being a scientific law is not strictly enough, as some are later shown to be
wrong, but a scientific law is the current best attempt at reality; this
possibility does not make an ontology non-realist. For a realist ontology, the
nodes are &ldquo;universals&rdquo;&mdash;entities in reality&mdash;rather than
concepts; at least one particular must exist for every universal.</p>
<p>This still leaves the difficulty of applying the realist definition in
practice. So most scientists will happily accept, for example, that a cell is
real as it is an entity that can be observed, interacted with and manipulated.
However, concepts such as
&ldquo;function&rdquo;&nbsp;<span class="cite">[<a href="#Lord2009">13</a>]</span>
have raised more
discussion&nbsp;<span class="cite">[<a href="#Shrager2003">20</a>]</span>;
is this &ldquo;real&rdquo; or just a word biologists use as a point of
reference? While the definition involving &ldquo;entities in reality&rdquo;
maybe of philosophical interest, they are hard to turn into a specific assay;
how to test whether a particular concept is, also, a universal. Instead of a
clear assay for existence, realism offers direction about what concepts are
NOT reality, rather than those that are reality. For example, and perhaps
ironically given the negative practical definition of reality, a statement
such as:</p>
<pre>
  Dog is_a not Cat
</pre>
<p>is not held to be a statement about reality as it is a logically
constructed example of subsumption (an <tt>is_a</tt> relationship); there is
no real universal containing particular <tt>not Cat</tt>s in existence.
Likewise,</p>
<pre>
  Dog is_a (Dog or Cat)
</pre>
<p>as the existence of particular <tt>Dog</tt>s and <tt>Cat</tt>s does not
mean that there are any particular <tt>Dog or Cat</tt>s (examples modified
from&nbsp;<span class="cite">[<a href="#smith2004beyond">12</a>]</span>).</p>
<p>This is not meant to provide a complete introduction to
&ldquo;realism&rdquo;, but to provide a grounding for the discussion that
follows; we will consider the issues raised by realism, throughout the paper.
A more philosophical treatment of realism is given by
Merrill&nbsp;<span class="cite">[<a href="#Merrill2010">16</a>]</span>.
It is useful to note that
Gruber&rsquo;s&nbsp;<span class="cite">[<a href="#Gruber1992">18</a>]</span>
statement that &ldquo;And it [a computational ontology] is certainly a
different sense of the word than its use in philosophy.&rdquo;. In this paper,
we are concerned with the ontologies as computational artefacts.</p>
<p>To summarise, a realist approach to ontology says that the categories or
universals in to which objects or particulars fall have an existence in their
own right. It is these universals and <em>only</em> these universals that a
realist approach says should be the nodes within an ontology. In this paper we
examine whether this approach is an adequate means to provide an account for
the data produced by biomedicine.</p>
<h2 id="a0000000007">Models that represent reality</h2>
<p>In this section, we suggest that many universals have a range of
representations. In some cases, the choice of representation may be obvious,
such as length which has a natural scientific representation in SI units. In
many cases, however, there is no clear set of criteria for choosing between
representations. We consider the way that one quality, <em>colour</em>, could
be represented ontologically.</p>
<p>Colour is a complex phenomenon. The colour of an object or other phenomena
arises, in part, from that object and, in part, from the eye that perceives
it.</p>
<p>A representation of the physical reality would be an account of the
reflection, transmission and perception of light by an organism. Such an
account of the reality of light and its perception might cover the following
facts: Chlorophyll is green in reflection and red in transmission; a flower
petal appears white to a human, but has UV stripes to a bee; the plant leaf
and the algae appear green to humans, but have different reflection spectra
because their chlorophyll co-ordinate to their Mg<sup>2+</sup> ion in
different ways.</p>
<p>There have been a number of different attempts to represent the
complexities of colour numerically, for a number of different purposes. These
are models that allow us to describe colour, without having to deal with the
underlying physics or reality of colour. Probably the best known of these are
RGB (Red, Green, Blue) or HSV (Hue, Saturation, Value), both of which are
additive colour models appropriate for describing colour on a display screen.
CYMK (Cyan, Yellow, Magenta and Black) is a subtractive colour model and
commonly used for printing.</p>
<p>Collectively these representation schemes are known
as <i class="itshape">colour models</i>. That none of these schemes has become
predominant reflects both their different uses and the preferences of
different user groups.</p>
<p>For the ontology builder, this leaves us with a difficult choice:</p>
<ol class="enumerate">
<li>
<p>We bless one of the colour models, substituting the model for the
    underlying physics and do not describe the others.</p>
</li>
<li>
<p>We describe all of the colour models, but do not describe that they are
    part of a colour model.</p>
</li>
<li>
<p>We explicitly describe the reality of the physics, biology and the
    relationship to the different colour models, reflecting the practise of
    describing colour in much of science.</p>
</li>
</ol>
<p>Currently, considering the PATO ontology, which is documented as being
built according to realist principles, the first approach has been taken,
using the HSV scheme. So, PATO has a term <b class="bfseries">Color Hue</b>
(PATO:15) that is defined as :</p>
<blockquote class="quote"><p>
  <i class="itshape">&ldquo;A chromatic scalar-circular quality inhering in an
  object that manifests in an observer by virtue of the dominant wavelength of
  the visible light; may be subject to fiat divisions, typically into 7 or 8
  spectra.&rdquo;</i>
</p></blockquote>
<p>Using this model, PATO describes <b class="bfseries">red</b> (PATO:322) as
:</p>
<blockquote class="quote"><p>
  <i class="itshape">&ldquo;A color hue with high wavelength of the long-wave
  end of the visible spectrum, evoked in the human observer by radiant energy
  with wavelengths of approximately 630 to 750 nanometers.&rdquo;</i>
</p></blockquote>
<p>This modelling approach has a number of limitations.</p>
<ul class="itemize">
<li>
<p>The decision to choose one colour model or the other is arbitrary.
    While there are reasonable justifications for the use of HSV as opposed
    to, for example, RGB, there is no <i class="itshape">a priori</i>
    justification for use of an additive colour model as opposed to a
    subtractive model. Both are valid, for different usage; in general,
    reflective colour is more common in biology (e.g. pigmentation) than
    emitted colour (e.g. fluorescence) which would suggest that subtractive
    models are more generally applicable, but a full treatment requires
    both.</p>
</li>
<li>
<p>There are no terms which can be used to express data described
    according to other colour models, necessitating a transformation between
    the different models into the officially &ldquo;blessed&rdquo; version
    during application of the ontology. These transformations may be lossy and
    not fully reversible.</p>
</li>
</ul>
<p>The second approach is also possible. This would allow expression of data
in multiple colour models, however:</p>
<ul class="itemize">
<li>
<p>The ontology would tend to get rather confusing as more colour models
    are added; colour would have children &ldquo;Hue&rdquo;, &ldquo;Red&rdquo;
    and &ldquo;Cyan&rdquo; and seven other sibling terms.</p>
</li>
<li>
<p>It is not clear which terms comprise a colour model: do values for
    &ldquo;Hue&rdquo;, &ldquo;Green&rdquo; and &ldquo;Magenta&rdquo; specify a
    colour?</p>
</li>
<li>
<p>It is not clear whether terms that occur in the other contexts are
    equivalent. Is &ldquo;Red as in RGB&rdquo; the same or different
    as <b class="bfseries">Red</b> (PATO:322)? Is &ldquo;Hue as in HSV&rdquo;
    the same or different from &ldquo;Hue as in HSL&rdquo; (HSL is another
    additive colour model).</p>
</li>
</ul>
<p>The third approach does not suffer from the limitations described. We
suggest from this analysis that it is necessary, if unfortunate, for some
qualities to be explicitly described with multiple representations. To avoid
confusion, the universal quality, colour, would need to be explicitly
described as having multiple valid models. Yet, realism argues that we should
not do this, as colour is real and not a model; more over, the focus on
realism means that the documentation does not describe the choices that have
been made, nor refer to the relationship between <b class="bfseries">Color
Hue</b> (PATO:15) and &ldquo;Hue as in HSV&rdquo;. In short, realism has
limited our ability to represent colour.</p>
<h3 id="a0000000008">Related Examples</h3>
<p>There are many different examples of this issue; having two or more models
  to describe the same part of reality is common. The distance between two
  markers on a chromosome can be measured using (one of a number of) genetic
  techniques. Some qualities have a bewildering array of different
  measurements associated with them; Wikipedia, for example, lists 13
  different measurements of concentration such as molarity or \(gm^{-3}\).</p>
<p>This issue has been previously recognised. In computing science, explicitly
modelling one model in another is a form of <em>metamodelling</em>. Other,
non-realist, upper-ontologies such as DOLCE use the concept
of <tt class="ttfamily">Quale</tt> to describe a cognitive abstraction (such
as Colour), including those over a physical quality (such as the spectral
properties of reflected
light)&nbsp;<span class="cite">[<a href="#Seyed2009">21</a>]</span>.</p>
<h2 id="a0000000009">Sequences and the Central Dogma</h2>
<p>The central dogma of molecular biology suggests that all genetic
information is encoded in the DNA of a cell, as the ordered nucleotides that
comprise the DNA. RNA is transcribed from this DNA. The RNA molecule also has
a defined order of nucleotides related to the DNA. Finally the RNA is
translated into protein.</p>
<p>Consider an ontology describing these entities. First, the DNA molecule has
a number of properties; as well as physical dimensions (discussed further in
&ldquo;sec:limits-consistency&rdquo;), including a length expressed in metres,
it consists of a number of monomeric units. So, for example, we might say a
DNA molecule with a series of nucleotide residues represented
as <tt class="ttfamily">&lsquo;GATC&rsquo;</tt> <tt class="ttfamily">has&shy;Monomeric&shy;Part</tt> <tt class="ttfamily">4</tt>.</p>
<p>This causes a slight worry from a realist perspective; the number 4 may not
  be a realist universal. There are no instances of 4. In this case, the
  number 4 is being used to describe a part of reality, so this is allowable
  in a realist ontology. Alternatively, we could describe the same reality
  using units (traditionally base-pairs or bp). Therefore,
  the <tt class="ttfamily">DNA
  molecule</tt> <tt class="ttfamily">has&shy;Polymer&shy;Length</tt> 4bp.</p>
<p>Accepting the use of natural numbers in this way, also means that we accept
  the use of sets and sequences to describe reality. One definition of 4 is a
  sequence. Stating that the DNA molecule represented with the
  sequence <tt class="ttfamily">&lsquo;GATC&rsquo;</tt> <tt class="ttfamily">has&shy;Polymer&shy;Length</tt> <tt class="ttfamily">4bp</tt>
  is equivalent, therefore, to stating that
  it <tt class="ttfamily">hasSequence</tt> <tt class="ttfamily">&lsquo;NNNN&rsquo;</tt>
  where <tt class="ttfamily">&lsquo;N&rsquo;</tt> is any nucleotide
  residue.</p>
<p>It should be noted, however, that the usefulness of these statements stems
from our <em>implicit</em> knowledge. The number 4 is a natural number,
so <tt class="ttfamily">has&shy;Monomeric&shy;Part</tt> <tt class="ttfamily">4.2</tt>
is not possible. If a new monomer is attached to our DNA molecule, it will
now <tt class="ttfamily">has&shy;Monomeric&shy;Part</tt> <tt class="ttfamily">5</tt>,
because the natural numbers are additive. We understand the operation of
natural numbers as part of our shared, background knowledge, and we can apply
this knowledge here.</p>
<p>Having described that the DNA molecule represented
as <tt class="ttfamily">&lsquo;GATC&rsquo;</tt> <tt class="ttfamily">has&shy;Polymer&shy;Length</tt> <tt class="ttfamily">4</tt>
(or <tt class="ttfamily">hasSequence</tt> <tt class="ttfamily">&lsquo;NNNN&rsquo;</tt>)
we might wish to be more specific about the order of nucleotide residues and
state <tt class="ttfamily">hasSequence</tt> <tt class="ttfamily">&lsquo;GATC&rsquo;</tt>.
The implicit background knowledge we used previously about the natural numbers
still applies here.</p>
<p>Next consider the process of transcription. The previous discussion about
DNA likewise applies to RNA. The RNA molecule will,
however, <tt class="ttfamily">hasSequence</tt> <tt class="ttfamily">&lsquo;GAUL&rsquo;</tt>,
as RNA uses a different set of bases to DNA. Mathematically, one sequence can
be determined from the other by applying a mapping; though the mapping is a
human activity, not a representation of biochemical reality. To describe this,
we have two options:</p>
<ul class="itemize">
<li>
<p>Taking the realist approach, we can continue to rely on
    the <em>implicit</em> knowledge of the biologist, as we have previously
    relied on an implicit understanding of the natural numbers.</p>
</li>
<li>
<p>We can be explicit about the properties of these sequences (additional
    to those properties shared with the naturals). We can talk about non-real
    world concepts such as alphabets, transformations and how these map to the
    real entities involved.</p>
</li>
</ul>
<p>It should be noted that the former severely limits the ability to describe
the central dogma. The transformation of DNA to RNA sequence is simple, but
the transformation of RNA to protein is more complex. Again, the choice is
between representing reality or representing how we practise science.</p>
<h3 id="a0000000010">Related examples</h3>
<p>The issues relating to sequences are fairly general. In computer science
terms, these are abstract data types. The DNA sequence is a kind of sequence
with special properties (a limited alphabet). Many of the physical quantities
in science have special properties in this way. Consider:</p>
<dl class="description">
<dt>Temperature:</dt>
<dd>
<p>While these look like positive real numbers, temperatures are only
    meaningfully subtracting from each other, which gives information about
    heat-flow between two bodies. Other operations (addition, multiplication)
    which are useful for real numbers have little meaning for temperature.</p>
</dd>
<dt>Recombination Distance:</dt>
<dd>
<p>These look like probabilities but are not, requiring a transformation
    to add.</p>
</dd>
</dl>
<p>There is a limitation on the ability to use abstract data types within a
given ontology language; in most cases, the expressivity of the language will
not allow arbitrary mathematical relations. Some languages, such as OWL, for
example, provide &ldquo;concrete domains&rdquo;; these provide extension
points within the ontology language where, for example, the special properties
of temperature could be represented; other languages do not. In either case,
there are limitations to these capabilities; for example, the constraint and
behaviour of a concrete domain needs to be interpreted with its own semantics
within a reasoner, rather than expressed explicitly within the ontology. It
may make more sense in many circumstances to describe the existence of a
mathematical model as discussed in &ldquo;sec:go-where-science&rdquo;.</p>
<h2 id="a0000000011">The limitations of computers</h2>
<p>Modelling continuous properties is a common problem in ontological
engineering. For example, according to statistics the western world is now
facing an obesity epidemic; in short many or most of us weigh too much.
Understanding, however, exactly what &ldquo;too much&rdquo; means is not
necessarily simple; a common technique to use is body mass index
(BMI)&mdash;body weight divided by square of the height, which is a continuous
value. The BMI range is split into 4 categories: Obese (&gt;30), Overweight
(&gt;25), Normal (&gt;18.5) and Underweight (&lt;18.5). These categories
represent ranges of the value of BMI.</p>
<p>This data simplification has many justifications. On an individual basis,
the BMI is not a particularly accurate measure, so the simplification does not
lose much accuracy. It is also easier to describe to patients, for whom a
&ldquo;BMI of 25&rdquo; will be less comprehensible than being
&ldquo;overweight&rdquo;.</p>
<p>Modelling some of this is straight-forward. Height and weight are modelled
as properties of the individual. The BMI would therefore appear to be a
property of the individual as it is a restatement of two existing properties.
It would appear, therefore, that the category into which an individual falls
should also be a property of the individual.</p>
<p>Consider the values of the property next. These categories are an
abstraction over the real-world properties. Although, height as an integer
value is expressed using a non-real-world entity, it is a description of a
part of reality. A range, however, in the BMI does not describe part of
reality in the same sense. There are no instances of BMI &ldquo;Obese&rdquo;.
In a realist ontology, therefore, it is unclear what the relationship is
between BMI Obese and the individual person.</p>
<p>For the statistician or computer scientist, there is an additional
advantage to the simplification; four discrete groups have better
computational properties than a continuous measure. Database queries become
easier to write, and quicker to run. This is also true for the ontology
builder; simplifying the real-world may fulfil the needs of an application for
which the ontology is built, while avoiding unnecessary complexity. This is a
widely used method for representing partitions of continuous values, the
appropriately named <em>value
partition</em>&nbsp;<span class="cite">[<a href="#rector2005">22</a>]</span>.</p>
<p>In the case of BMI there is a pre-existing social agreement toward a set of
categories; however, even in the absence of such an agreement, the ontology
builder might wish to represent a continuous range as a value partition to
decrease the complexity of their ontology. The value partition is useful, but
many of the concepts involved are not realist universals. The choice, then, is
modelling &ldquo;reality&rdquo; and modelling a simplification that is easier
to use and has better computational properties.</p>
<h3 id="a0000000012">Related Examples</h3>
<p>Splitting the two cases, there are many examples of pre-existing
simplifications. From medicine, there are so many that it seems to be the norm
rather than the exception: hypo- vs hyperthermic; hypo vs hypertensive; hypo-
vs hyperglycemic. In many cases, these ranges have standard interpretations
akin to the BMI.</p>
<p>There are likewise a number of constructions or design patterns that reduce
complexity, extend the effective capabilities of the language or simply
provide standard solutions to common
problems&nbsp;<span class="cite">[<a href="#egana2008">23</a>]</span>.</p>
<h2 id="a0000000013">To go where science has gone before</h2>
<p>Many experiments in biomedicine require the measurement of some physical
property of a biological system. Take, for example, the measurement of heart
rate; in standard practice, this is measured in beats per minute, and is
calculated simply by counting beats (\(b\)) over a time period (\(t\))
and dividing one by the other (\(b/t\)). However, what time period is
appropriate? We might choose 60s, but this raises the question, what is the
meaning of heart rate over shorter periods?</p>
<p>Fortunately, there is a standard solution to this problem, which is to
  define heart rate using differential calculus; so heart rate becomes \(db/dt\).</p>
<p>The derivative, \(db/dt\), presents some problems from a realist
perspective. As noted previously (see &ldquo;sec:sequ-centr-dogma&rdquo;), it
is possible to associate real numbers with entities; however, \(db/dt\) is
\(0/0\). It is not clear whether this quantity is a universal; it is
certainly the case that the expression \(db/dt\) is not a universal, yet
such values and calculus itself is apowerful tool within science and not using
it within ontological models is a severe restriction.</p>
<p>We can describe this ontologically in three ways:</p>
<ul class="itemize">
<li>
<p>We can model the real world entities involved &ndash; beats, time and
    describe nothing else.</p>
</li>
<li>
<p>We can describe rate in mathematical terms. In this case, we are
    defining the heart rate as a mathematical abstraction.</p>
</li>
<li>
<p>We can model the heart rate as a real world entity, \(db/dt\) as a
    mathematical entity and explicitly state that $latex db/dt is a model of
    heart rate.</p>
</li>
</ul>
<p>These different solutions present different advantages. The first is
  consistent with realism. The second is consistent with the most common
  definition used within science. The third is consistent with both but it is
  unclear when to use which term (for example, is \(\Delta {}b/\Delta{} t\) 
  an approximation of \(db/dt\), a quantification of the real world
  quality or both)?</p>
<p>In most cases for the description of science, the second option makes most
sense; conflating the mathematical model with the real entity enables us to
use the advantages of two different modelling techniques without introducing
the confusion of the third option.</p>
<h3 id="a0000000014">Related Examples</h3>
<p>There are many related examples from mechanics, electromagnetics or
chemistry; as with value partitions in medicine, so many that they appear to
be the norm. All of these subject areas have direct relevance to biology and,
perhaps even more so, to the equipment used in the practice of biology.</p>
<p>Mechanical examples would include velocity (\(dr/dt\)) and acceleration
(\(d^2r/dt^2\)). Electromagnetics would include current (\(dC/dt\))
and capacitance (\(dV/dt\)). Chemistry examples would include rate
constants and pH. In biology, population biology, systems biology and
neurosciences make wide use of mathematical models. The lack of a link in
realist ontologies to these mathematical models is not free from consequences
(described further in &ldquo;sec:discussion&rdquo;).</p>
<p>The more general issue comes not from relating to differential calculus,
but relating to pre-existing non-ontological techniques. For example, taxonomy
in the linnean sense. There have been many discussions about whether species
and high taxons are reflective of reality; it is certainly the case that a
number of higher taxons do not reflect
phylogeny&nbsp;<span class="cite">[<a href="#Schulz2008">24</a>]</span>.
Given that it is of uncertain status, should we represent taxonomy as a
quality of an organism, an independent conceptualisation of the biologists or
both?</p>
<h2 id="a0000000015">The limits of consistency</h2>
<p>Physical biological entities such as cells and organisms have an extent in
the real world. This paper&rsquo;s first author, for example, has a height of
around 1.8m; a similar value cannot be applied meaningfully to the electronic
version of this document, although it may apply to the paper that it may be
printed on.</p>
<p>There are a number of different, well-understood mechanisms for
representing physical space. We can use a dimensional or cartesian model, with
three perpendicular lines with a linear scale. We can use a polar model,
expressing extent using angles and a single distance. Modern physics has told
us, however, that all of these are limited models of reality; physics
generally uses a four dimensional Minkowskian spacetime model; here the axes
are not linear; motion of the observer down one will change values down the
others. Alternatively, at a quantum level, length is a probability
distribution.</p>
<p>For the ontology builder, this leaves a difficult choice and the same
choice discussed previously in &ldquo;sec:colo-colo-models&rdquo;: Represent
the reality physicists relate; bless one, ignore the rest; describe their
components but not their models; explicitly describe them.</p>
<p>If the ontology builder is to be consistent, then, they should make the
same choice in both cases; if we describe colour models, we should explicitly
describe Minkowskian spacetime, quantuum probability distributions, cartesian
and polar systems.</p>
<p>There are, however, two important differences to colour models. First,
there is a strong social bias toward cartesian systems. Secondly, within the
scope of biology and the life sciences, four dimensional spacetime or quantuum
models confuse rather than simplify; the relativistic corrections produce such
small differences that they are statistically meaningless; similarly,
describing a leg as a probability distribution adds little other than
complexity.</p>
<p>This leaves the ontology builder with two options:</p>
<ol class="enumerate">
<li>
<p>We can build an ontology with a consistent relationship to reality. So,
    having decided to explicitly represent colour models, this suggests that
    we should also explicitly model 3D space, 4D spacetime and the various
    co-ordinate systems that are used to describe these.</p>
</li>
<li>
<p>We build an ontology with an inconsistent relationship to reality. So,
    we might be explicit about colour models, but arbitrarily bless 3
    dimensional space, using cartesian co-ordinates.</p>
</li>
</ol>
<p>The compromise here is very straight-forward. The first solution retains
its consistency to reality, the second is consistent with usability and usage;
for biomedicine, a 3D cartesian co-ordinate system plus time is likely to be
enough for the foreseeable future and makes life easier in the meantime.</p>
<p>The Newtonian view of the world is the best model in this case: it is good
enough. When building an ontology for biomedicine, it makes most sense to use
this view as it will produce the results required. If, in the future,
biomedicine advances so that relativistic or quantuum representations are
necessary, then current ontologies will need refactoring; even then, this
future cost is likely to be offset by gains in the present.</p>
<h3 id="a0000000016">Related examples</h3>
<p>In the choice of units for measurement for scientific purposes, SI units
are to be preferred. It should be noted, here, that there is a domain
dependency; for an engineering ontology, the use of American imperial units
would be inevitable.</p>
<p>For most of biology it is unnecessary to distinguish between the length of
the calendar year and the astronomical year&mdash;the latter changing with
respect to variability in the motion of the earth. There are occasions when
this distinction may be important for data integration in bioinformatics as
leap years and leap seconds show.</p>
<p>For an ecologist counting the number of trees in a sampling square 100m by
100m, they will take the area as 10,000m<sup>2</sup>; The surface is, however,
neither smooth nor a Euclidean plane, so this area is wrong in reality. For
much of ecology, this distinction will not matter. Again, there is a domain
dependency here; whale or bird biologists interested in migration patterns may
well care about the curvature of the earth.</p>
<h1 id="a0000000017">Discussion</h1>
<p>Realism has been held up as a methodology for &ldquo;good&rdquo;
ontological modelling, and the production of more tightly defined and
consistent ontologies. In this paper, we have discussed five different cases,
with biological examples, that we might wish to model ontologically; for each,
we have presented different models, describing the same underlying science. In
each case, a realist solution is possible, but places either limitations or
awkwardness on the models produced.</p>
<p>Building an ontology with a consistent relationship to reality may help to
enable
interoperability&nbsp;<span class="cite">[<a href="#Smith2007">7</a>]</span>
under some circumstances. If, however, it disallows modifications for
computability (see &ldquo;sec:work-around-comp&rdquo;), or requires arbitrary
blessing for one form of specification over another (see
&ldquo;sec:colo-colo-models&rdquo;) it may have the opposite effect.</p>
<p>Nor are the issues discussed in this paper free from consequences. In
&ldquo;sec:go-where-science&rdquo;, we discussed interoperability with
existing scientific models. Mathematics and physics have produced complex,
refined and expressive notation systems, representing a deep understanding of
how numbers and the physical world work. These are, however, not being used in
current ontologies and this results in a lack of precision, errors and
omissions:</p>
<dl class="description">
<dt>Lack of Precision:</dt>
<dd>
<p>The PATO term <b class="bfseries">speed</b> (PATO:8) which is defined
    as:</p>
<blockquote class="quote"><p>
      <i class="itshape">&ldquo;A physical quality inhering in a bearer by
      virtue of the bearer&rsquo;s rate of change of position&rdquo;</i>
    </p></blockquote>
<p>with a synonym of <tt class="ttfamily">velocity</tt>; from this
    definition, we cannot distinguish the vector and scalar quantities of
    velocity and speed; indeed, it is not clear which of these
    two <b class="bfseries">speed</b> (PATO:8) is.
    Meanwhile <b class="bfseries">acceleration</b> (PATO:1028) is defined
    as:</p>
<blockquote class="quote"><p>
      <i class="itshape">&ldquo;&hellip; the rate of change of the
      bearer&rsquo;s velocity in either speed or direction&rdquo;</i>
    </p></blockquote>
<p>which is implicitly a vector quantity, and contradicts the statement
    that speed and velocity are synonyms. The mathematical definitions
    (velocity as \(dr/dt\), speed \(\left|{dr/dt}\right|\),
    acceleration \(d^2r/dt^2\)) are precise, concise and accurate.</p>
</dd>
<dt>Errors:</dt>
<dd>
<p>Similarly, <b class="bfseries">length</b> (PATO:122) is defined as a
    quality; qualities have to inhere in <tt class="ttfamily">Independent
    Continuant</tt>s; as a <tt class="ttfamily">Spatial Region</tt> is a child
    of <tt class="ttfamily">Continuant</tt> this means
    that <tt class="ttfamily">Spatial Region</tt>s cannot
    bear <tt class="ttfamily">length</tt>s. In short, in current versions of
    BFO, there is no intuitive way of modelling the length of a region in
    space.</p>
</dd>
<dt>Omissions:</dt>
<dd>
<p>BFO is mass-centric; it is currently unclear where many physical
    entities exist, examples including energy, waves (through a medium) or EM
    radiation. Likewise, it lacks a natural position for numbers (that have no
    particulars), patterns and distributions. Yet, these entities are key to a
    physical description of the world.</p>
</dd>
</dl>
<p>To our mind, these are indicative of some of the most serious flaws of
realism-based ontology building. It makes little sense to replicate the models
of physics using English instead of a more precise mathematical notation. If
BFO had been built using direct links to a grounded physical model of the
world, it seems likely that these problems would not have arisen.</p>
<p>We have discussed a number of concrete examples where building an ontology
by considering realist concerns has detrimental consequences for the model. We
believe that the real world entities and the relationships between them is
only one consideration among many: simplicity, usability, fitness for purpose
are equally important.</p>
<p>Taken to its most extreme form realism, it seems to these authors, would
produce models unsuitable for use within science. There is a choice between a
correct account of reality that does not allow the data of science to be
adequately described and a description of reality that takes in to account how
science is performed. Fortunately, most &ldquo;realist&rdquo; ontologies are
not really so: PATOs representation of HSV for modelling colour is not a bad
decision; it represents a straight-forward, pragmatic approach to ontology
building, where the representation has been chosen on the basis of a use case,
not the entities as they exist in reality. Similarly BFO uses a 3D plus time
model of reality; it suggests that length are properties of the entity alone,
without reference to the observer. This is not a true reflection of reality,
but one which is a good enough approximation for use within the biomedical
sciences; in short, usability and simplicity have been considered to be more
important in the modelling process than the relationship of the model to
reality. In accepting these compromises, BFO has placed itself squarely as a
computational rather than philosophical ontology.</p>
<p>Despite these concerns, realism has made a contribution to the field of
biomedical ontology engineering. By emphasising the importance of real-world
entities and by encouraging a more specific interpretation than the
generalisation of a &ldquo;conceptualisation&rdquo;, realism helps to avoid
the introduction of unnecessary layers of abstraction. A consideration of the
entities in reality may be a part of an ontology engineering process; ontology
builders should have careful and considered reasons for diverting from
modelling in this way and that ontologies should explicitly describe through
annotations the terms that do or may divert from this view. Ontology builders
should, however, be free to make this decision; the acceptance of compromise
with respect to reality will result in simpler and more effective knowledge
artefacts.</p>
<p>Johansson&nbsp;<span class="cite">[<a href="#Johansson2006">10</a>]</span>
when discussing realism asks the rhetorical question: &ldquo;would you like to
be treated for a physiological illness by a <em>(non-realist)</em> physician
who is not sure that there are human bodies?&rdquo; &ndash; (our emphasis). As
scientists, our reply would be if their survival and success statistics were
the best, we would not care whether they were a realist, a non-realist or a
robot which admitted of no philosophical position at all; also, using a doctor
who was strictly realist and thus cut off from much of the practise of science
(such as determining heart rate) would disturb many patients. As
bioinformaticians, we build ontologies to provide a descriptive and predictive
model of the wealth of experimental data that is now available. In biology,
the job of an ontologist is to describe data such that it can be analysed.
Naturally this entails a description of entities in reality; it also, however,
entails a description of science, and it entails compromise; we overlook this
to our peril. The last 200 years of science shows the success and strength of
this position; it is on this groundwork that we should build for the
future.</p>
<div>
<h1>Bibliography</h1>
<dl class="bibliography">
<dt>[<a name="Ashburner2000" id="Ashburner2000">1</a>]</dt>
<dd>
<p>Ashburner M, Ball C, Blake J, Botstein D, Butler H, et&nbsp;al.
      (2000) Gene Ontology: a tool for the unification of biology. The Gene
      Ontology Consortium. Nat Genet 25: 25&ndash;9.</p>
</dd>
<dt>[<a name="handbook2" id="handbook2">2</a>]</dt>
<dd>
<p>Stevens R, Lord P (2008) Application of ontologies in bioinformatics.
      In: Staab S, Studer R, editors, Handbook on Ontologies in Information
      Systems, Springer. Second edition.
      URL <a href="http://www.cs.man.ac.uk/~stevensr/papers/handbook2.pdf">http://www.cs.man.ac.uk/~stevensr/papers/handbook2.pdf</a>.</p>
</dd>
<dt>[<a name="Zeeberg2003" id="Zeeberg2003">3</a>]</dt>
<dd>
<p>Zeeberg B, Feng W, Wang G, Wang M, Fojo A, et&nbsp;al. (2003)
      GoMiner: a resource for biological interpretation of genomic and
      proteomic data. Genome Biol 4: R28.</p>
</dd>
<dt>[<a name="Wolstencroft2006" id="Wolstencroft2006">4</a>]</dt>
<dd>
<p>Wolstencroft K, Lord P, Tabernero L, Brass A, Stevens R (2006)
      Protein classification using ontology classification. Bioinformatics 22:
      e530-538.</p>
</dd>
<dt>[<a name="Lord2003" id="Lord2003">5</a>]</dt>
<dd>
<p>Lord PW, Stevens RD, Brass A, Goble CA (2003) Investigating semantic
      similarity measures across the gene ontology: the relationship between
      sequence and annotation. Bioinformatics 19: 1275&ndash;1283.</p>
</dd>
<dt>[<a name="Whetzel2006a" id="Whetzel2006a">6</a>]</dt>
<dd>
<p>Whetzel PL, Parkinson H, Causton HC, Fan L, Fostel J, et&nbsp;al.
      (2006) The MGED Ontology: a resource for semantics-based description of
      microarray experiments. Bioinformatics 22: 866&ndash;873.</p>
</dd>
<dt>[<a name="Smith2007" id="Smith2007">7</a>]</dt>
<dd>
<p>Smith B, Ashburner M, Rosse C, Bard J, Bug W, et&nbsp;al. (2007) The
      OBO Foundry: coordinated evolution of ontologies to support biomedical
      data integration. Nat Biotechnol 25: 1251&ndash;1255.</p>
</dd>
<dt>[<a name="OBOFoundry2006" id="OBOFoundry2006">8</a>]</dt>
<dd>
<p>OBO Foundry Consortium (2006). OBO Foundry
      Principles. <a href="http://obofoundry.org/wiki/index.php/OBO_Foundry_Principles">http://obofoundry.org/wiki/index.php/OBO_Foundry_Principles</a>.</p>
</dd>
<dt>[<a name="OBOFoundry2008" id="OBOFoundry2008">9</a>]</dt>
<dd>
<p>OBO Foundry Consortium (2008). OBO Foundry
      Principles. <a href="http://obofoundry.org/wiki/index.php/OBO_Foundry_Principles">http://obofoundry.org/wiki/index.php/OBO_Foundry_Principles</a>.</p>
</dd>
<dt>[<a name="Johansson2006" id="Johansson2006">10</a>]</dt>
<dd>
<p>Johansson I (2006) Bioinformatics and biological reality. J Biomed
      Inform 39: 274&ndash;287.</p>
</dd>
<dt>[<a name="Grenon2004" id="Grenon2004">11</a>]</dt>
<dd>
<p>Grenon P, Smith B, Goldberg L (2004) Biodynamic ontology: applying
      BFO in the biomedical domain. Stud Health Technol Inform 102:
      20&ndash;38.</p>
</dd>
<dt>[<a name="smith2004beyond" id="smith2004beyond">12</a>]</dt>
<dd>
<p>Smith B (2004) Beyond concepts: ontology as reality representation.
      In: Formal ontology in information systems: proceedings of the third
      conference (FOIS-2004). Ios Pr Inc, p.&nbsp;73.</p>
</dd>
<dt>[<a name="Lord2009" id="Lord2009">13</a>]</dt>
<dd>
<p>Lord P (2009) An Evolutionary Approach to Function. In:
      Bio-Ontologies 2009: Knowledge in Biology.
      URL <a href="http://hdl.handle.net/10101/npre.2009.3228.1">http://hdl.handle.net/10101/npre.2009.3228.1</a>.</p>
</dd>
<dt>[<a name="Smith2005" id="Smith2005">14</a>]</dt>
<dd>
<p>Smith B, Ceusters W, Klagges B, K&ouml;hler J, Kumar A, et&nbsp;al.
      (2005) Relations in biomedical ontologies. Genome Biol 6: R46.</p>
</dd>
<dt>[<a name="Russell1946" id="Russell1946">15</a>]</dt>
<dd>
<p>Russell B (1946) A History of Western Philosophy. Routledge.</p>
</dd>
<dt>[<a name="Merrill2010" id="Merrill2010">16</a>]</dt>
<dd>
<p>Merrill G (2010) Ontological realism: methodology or misdirection.
      Applied Ontology 5: 79-108.</p>
</dd>
<dt>[<a name="Dumontier2010" id="Dumontier2010">17</a>]</dt>
<dd>
<p>Dumontier M, Hoehndorf R (2010) Realism for scientific ontologies.
      In: 6th International Conference on Formal Ontology in Information
      Systems.</p>
</dd>
<dt>[<a name="Gruber1992" id="Gruber1992">18</a>]</dt>
<dd>
<p>Gruber T (1992). What is an ontology?
      URL <a href="http://www-ksl.stanford.edu/kst/what-is-an-ontology.html">http://www-ksl.stanford.edu/kst/what-is-an-ontology.html</a>.</p>
</dd>
<dt>[<a name="Ceusters2006" id="Ceusters2006">19</a>]</dt>
<dd>
<p>Ceusters W, Smith B (2006) A realism-based approach to the evolution
      of biomedical ontologies. AMIA Annu Symp Proc : 121&ndash;125.</p>
</dd>
<dt>[<a name="Shrager2003" id="Shrager2003">20</a>]</dt>
<dd>
<p>Shrager J (2003) The fiction of function. Bioinformatics 19:
      1934-1936.</p>
</dd>
<dt>[<a name="Seyed2009" id="Seyed2009">21</a>]</dt>
<dd>
<p>Seyed AP (2009) BFO/DOLCE Primitive Relation Comparison. In:
      BioOntologies 2009: Knowledge in Biology.</p>
</dd>
<dt>[<a name="rector2005" id="rector2005">22</a>]</dt>
<dd>
<p>Rector A (2005). Representing specified values in owl: &ldquo;value
      partitions&rdquo; and &ldquo;value sets&rdquo;. W3C Working Group Note.
      URL <a href="http://www.w3.org/TR/swbp-specified-values/">http://www.w3.org/TR/swbp-specified-values/</a>.</p>
</dd>
<dt>[<a name="egana2008" id="egana2008">23</a>]</dt>
<dd>
<p>Egana M, Rector A, Stevens R, Antezana E (2008) Applying Ontology
      Design Patterns in Bio-ontologies, Springer Berlin/Heidelberg. pp.
      7-16.</p>
</dd>
<dt>[<a name="Schulz2008" id="Schulz2008">24</a>]</dt>
<dd>
<p>Schulz S, Stenzhorn H, Boeker M (2008) The ontology of biological
      taxa. Bioinformatics 24: i313&ndash;i321.</p>
</dd>
</dl>
</div>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1713 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2010/07/realism-and-science/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Knowledge Blogging</title>
		<link>http://www.russet.org.uk/blog/2010/06/knowledge-blogging/</link>
		<comments>http://www.russet.org.uk/blog/2010/06/knowledge-blogging/#comments</comments>
		<pubDate>Fri, 18 Jun 2010 16:16:36 +0000</pubDate>
		<dc:creator>Phil Lord</dc:creator>
				<category><![CDATA[Science]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1701</guid>
		<description><![CDATA[Some advance on the knowledge blog front this week. Firstly, myself and Simon Cockell spent a short while setting up a development and testing environment and wrote our first wordpress plugin&#8201;&#8212;&#8201;&#8221;Peaches&#8221; based around the Hello Dolly plugin, but with the lyrics from the Stranglers song instead. We finished this yesterday just before automattic released WordPress [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1701">
<p>Some advance on the knowledge blog front this week. Firstly, myself and <a href="http://blog.fuzzierlogic.com/">Simon Cockell</a> spent a short while setting up a development and testing environment and wrote our first wordpress plugin&#8201;&#8212;&#8201;&#8221;Peaches&#8221; based around the Hello Dolly plugin, but with the lyrics from the Stranglers song instead. We finished this yesterday just before automattic released WordPress 3.0. Hopefully, it will be easy to upgrade. Rather more usefully, I got the very first version of a reference list plugin working. At the moment, it just transforms DOIs into hyperlinks.</p>
<p>And, secondly, I got notification from the British Library that they will be archiving the website. Good news, although there are not archives available yet.</p>
<p>We move forward!</p>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1701 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2010/06/knowledge-blogging/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New Grant for Knowledgeblog</title>
		<link>http://www.russet.org.uk/blog/2010/05/new-grant-for-knowledgeblog/</link>
		<comments>http://www.russet.org.uk/blog/2010/05/new-grant-for-knowledgeblog/#comments</comments>
		<pubDate>Wed, 26 May 2010 14:01:29 +0000</pubDate>
		<dc:creator>Phil Lord</dc:creator>
				<category><![CDATA[Science]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1687</guid>
		<description><![CDATA[It&#8217;s been relatively quiet from me for the last few weeks. One of the reasons for this is that I have been submitting a JISC bid. I&#8217;ve not submitted a JISC bid before, so it was quite a lot of work; it&#8217;s exactly the same as a research council proposal, except for all the bits [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1687">
<p>It&#8217;s been relatively quiet from me for the last few weeks. One of the reasons for this is that I have been submitting a <a href="http://www.jisc.ac.uk/fundingopportunities/funding_calls/2009/12/1409researchdata.aspx">JISC bid</a>. I&#8217;ve not submitted a JISC bid before, so it was quite a lot of work; it&#8217;s exactly the same as a research council proposal, except for all the bits that differ.</p>
<p>The bid, in this case, was for extensions to the <a href="http://www.knowledgeblog.org">Knowledgeblog</a> environment; we want to make sure that it supports research better than at the current time. Our initial experiences were generally good, with a <a href="http://ontogenesis.knowledgeblog.org/647">few naysayers</a>. Additionally, we wanted much better linking to external forms of data; array express, Swissprot and the like. And, finally, we wanted to trial this out against a set of specific use cases. Critically, I also got tired of writing &#8220;knowledgeblog&#8221; the entire time, so they will now be &#8220;k-blogs&#8221;.</p>
<p>If it gets accepted, we proposing to develop some additional functionality, often reusing existing software. We really are trying to avoid developing any software that we don&#8217;t have to. The plans include:</p>
<ol type="1"> 
<li> A documented k-blog process, including information on who does want, and how to use various existing tools (word and latex in particular). </li>
<li> Proper support for referencing&#8201;&#8212;&#8201;authors should be able to drop in a PMID, or DOI and get a reference list and in-text citation automatically. </li>
<li> Various metadata support, so that the in-text citations have semantics from the readers side. </li>
<li> Trackback proxying for those resources which don&#8217;t support trackbacks. </li>
<li> Integration and additional tooling for adding references and cross-links. </li>
</ol>
<p>I&#8217;m hoping that we get the money; if we do, the work will give us a platform on which to build a publishing environment, a place for an educational resource, and finally, and excellent extension point for playing with semantic forms of publishing. I am not sure what the odds are; I know quite a few other proposals are going in, and there&#8217;s a reasonable chance that George Osbourne will cut the money back before its awarded. All I can do now is wait.</p>
<p>I&#8217;ll probably blog the whole proposal in a few days; this gives me a chance to try out the &#8220;blogging from Word&#8221; experience. How exciting.</p>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1687 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2010/05/new-grant-for-knowledgeblog/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PhD Position Available</title>
		<link>http://www.russet.org.uk/blog/2010/05/phd-position-available/</link>
		<comments>http://www.russet.org.uk/blog/2010/05/phd-position-available/#comments</comments>
		<pubDate>Mon, 17 May 2010 15:49:50 +0000</pubDate>
		<dc:creator>Phil Lord</dc:creator>
				<category><![CDATA[Science]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1685</guid>
		<description><![CDATA[I have a new PhD position available; I am looking to extend some work that I was involved with a while ago now, but into a new area of biology. The idea is that we build an ontological model of the mitochondria, and the knowledge that exists about it. We should be able to build [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1685">
<p>I have a new PhD position available; I am looking to <a href="http://bioinformatics.oxfordjournals.org/cgi/content/abstract/22/14/e530">extend</a> some work that I was involved with a while ago now, but into a new area of biology. The idea is that we build an ontological model of the mitochondria, and the knowledge that exists about it. We should be able to build a light-weight model that covers many areas of the biology as an entire system. This will be useful both as an integration point (a traditional use for ontologies), but also so that we can make predictions and search for inconsistencies in the model. In other words, the ontology should be an integral part of the scientific process; we represent a hypothesis ontological and then let the reasoner search for the data for contradictions.</p>
<p>This is quite exciting, as we did the original work quite a few years ago, and it looked very promising; despite the gap, I still think this could work really well. Since that time, system biology has gained currency; this work fits, as we aim to look at the mitochondria as a whole. Instead of an in depth mathematical model of part of the mitochondria, as is common in systems biology, we will have a light-weight logical model of both what we know about the mitochondria <strong>and</strong> how we know it.</p>
<p>Please feel free to distribute this!</p>
<hr /> 
<h2><a name="_phd_studentship_2010"></a>PhD Studentship, 2010</h2>
<p>EPSRC PhD Studentship Building a logical model of biology: the Ontology of Mitochondria</p>
<p>For this project, you will use cutting edge technology designed for the Semantic Web, and apply it to the new field of systems biology. Specifically, you will develop an OWL ontology, a formal, logically specified model, to describe the mitochondria, a subsystem of the cell. You will use this to integrate large amounts of real-world data, to search for inconsistencies and produce a predictions about the underlying biology. From a computing perspective, this will result in insights both about the technology, and its scalablity; from a systems biology perspective, you gain understanding of the value of models which are wider than traditional mathematical models; from a biomedical perspective, you may gain insight in the functioning and behaviour of a medically important system of the cell.</p>
<p>This is a challenging multi-disciplinary project; applicants are not expected to understand all its aspects at the outset; as a result, it is of interest to those from either a computing science, computational biology or bioinformatics background. Any experience of ontologies, modelling or mitochondrial biology will be an advantage, but is not required. A willingness to learn is critical; students will spend significant time in both a computing science and biology environment, and will become familiar with both.</p>
<p>You should have either a First or 2.1 in Computing Science, a Biological Science or Mathematics, and a distinction level Masters degree in a related subject. Equivalent experience will also be considered.</p>
<p>Depending on how you meet the EPSRC&#8217;s eligibility criteria, you may be entitled to a full or a partial award. A full award covers tuition fees at the UK/EU rate and an annual stipend of £13,290 (2009/10). A partial award covers fees at the UK/EU rate only.</p>
<p>For further details, please contact Phillip Lord &lt;<a href="mailto:phillip.lord@newcastle.ac.uk">phillip.lord@newcastle.ac.uk</a>&gt;.</p>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1685 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2010/05/phd-position-available/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Second Knowledge Blog Meeting</title>
		<link>http://www.russet.org.uk/blog/2010/04/the-second-knowledge-blog-meeting/</link>
		<comments>http://www.russet.org.uk/blog/2010/04/the-second-knowledge-blog-meeting/#comments</comments>
		<pubDate>Mon, 12 Apr 2010 14:53:14 +0000</pubDate>
		<dc:creator>Phil Lord</dc:creator>
				<category><![CDATA[Science]]></category>

		<guid isPermaLink="false">http://www.russet.org.uk/blog/?p=1681</guid>
		<description><![CDATA[I&#8217;m on my way to the second Knowledge Blog meeting. Well, sort of. The first meeting was badged the &#8220;Ontogenesis Tutorial&#8221; meeting; the focus was on developing a tutorial resource for ontologies. Actually, much the same will be true of this meeting, but I&#8217;ve decided that, for this meeting, as well as addressing the reviews [...]]]></description>
			<content:encoded><![CDATA[<div class="kcite-section" kcite-section-id="1681">
<p>I&#8217;m on my way to the second <a href="http://www.knowledgeblog.org">Knowledge Blog</a> meeting. Well, sort of. The <a href="http://www.russet.org.uk/blog/2010/01/the-ontogenesis-tutorial/">first</a> meeting was badged the &#8220;Ontogenesis Tutorial&#8221; meeting; the focus was on developing a tutorial resource for ontologies. Actually, much the same will be true of this meeting, but I&#8217;ve decided that, for this meeting, as well as addressing the reviews for my own article on Ontogenesis, I am going to want to spend some time supporting the process itself. In the first place, this means writing a couple of articles for <a href="http://process.knowledgeblog.org">Process</a>: a new knowledge blog that I am starting for discussion of the process itself.</p>
<p>Since the first meeting, I&#8217;ve had plenty of time to reflect on the general idea of <a href="http://www.knowledgeblog.org">knowledgeblogging</a>. As far as I can see, there is one overwhelming truth about the situation; we got 15 articles in 2 days and, since then, we have been averaging between 500 and 1000 page hits a month. Now, of course, it&#8217;s an open question whether this is at all sustainable; we have no advertising and no financial support. But, still, our most read article (&#8220;What is an Ontology&#8221;) has had several hundred reads and, bottom line, that is pretty good going for an academic article. We might like to think that the work that we do is important (well, it is!), but in publishing terms we are pretty much of a niche market.</p>
<p>On the negative side, we have had articles flooding in and none of those from the last meeting have got any further. Thinking back to <a href="http://en.wikipedia.org/wiki/Nupedia">Nupedia</a>, many moons ago, it&#8217;s obvious that getting an authorship is always going to be a problem.</p>
<p>I&#8217;m also going to have to think of a snappier and short name for than &#8220;knowledgeblog&#8221; which is taking far too long to type. So far:</p>
<dl> 
<dt> k-log </dt>
<dd> Simple, straightforward, but already used </dd>
<dt> knowblog </dt>
<dd> Good, but a homonym for &#8220;noblog&#8221; which is confusing. </dd>
<dt> knoblog </dt>
<dd> Pronounced &#8220;noh-blog&#8221; would be great, but English is not a phoentic language </dd>
<dt> knob </dt>
<dd> &#8220;KNOweledge Blog&#8221;&#8201;&#8212;&#8201;excellent in many ways, but I realise that the entire world does not share my slightly puerile sense of humour. </dd>
</dl>
<p>Hmmm. Comments welcome. So long as they are not about my puerile sense of humour.</p>
<!-- kcite active, but no citations found -->
</div> <!-- kcite-section 1681 -->]]></content:encoded>
			<wfw:commentRss>http://www.russet.org.uk/blog/2010/04/the-second-knowledge-blog-meeting/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

