The pain of foreign names

Our original intention with Greycite [@url:greycite.knowledgeblog.org] was to build a tool which can provide bibliographic metadata for any URL, to support my own kcite referencing tool [@url:knowledgeblog.org/kcite-plugin] While it still fulfils this function, it also turns out to be a useful, general-purpose tool for investigating the metadata in various web pages. And this reveals some interesting results. We discovered a nice example of this recently while adding RIS support. The paper in question comes from EMBO reports [@doi:10.1038/embor.2013.11] At first sight, the RIS for this page taken from Greycite looks reasonable. TY - ELEC UR - http://www.nature.com/embor/journal/v14/n3/full/embor201311a.html Y2 - 2013-04-18 12:54:54 TI - The economics of creative research JO - EMBO reports …

Temporary Title

Abstract [kblog-inc server="arxiv"]1303.0213[/kblog-inc] Plain English Summary In this paper, I describe some new software, called Tawny-OWL, that addresses the issue of building ontologies. An ontology is a formal hierarchy, which can be used to describe different parts of the world, including biology which is my main interest. Building ontologies in any form is hard, but many ontologies are repetitive, having many similar terms. Current ontology building tools tend to require a significant amount of manual intervention. Rather than look to creating new tools, Tawny-OWL is a library written in full programming language, which helps to redefine the problem of ontology building to one of programming. Instead of building new ontology tools, the hope is that Tawny-OWL will enable on…

Overlays over arXiv

Much has been said about overlay journals [@url:gowers.wordpress.com/2013/01/16/why-ive-also-joined-the-good-guys/] The idea is simple; the journal essentially becomes a selector, a channel, with the paper itself being hosted elsewhere, such as arXiv. This holds a certain amount of attraction for me; I already post my new papers on arXiv. I have been posting them here also [@url:www.russet.org.uk/blog/1713] This works well, but is hampered by technology. Mostly I write papers in LaTeX, and I have written tools to make these suitable for Wordpress [@url:www.russet.org.uk/blog/1740] these work well enough to publish an entire thesis [@url:themindwobbles.wordpress.com/2013/01/02/phd-thesis-table-of-contents/] However, the process of doing this is not slick [@url:themindwobbles.wordpress.com/2…

Archiving of Scientific Material

In this article, I consider the practical issues with archiving of scientific material placed on the web; I will describe the motivation for doing this, the background and consider the various mechanisms for doing so. As part of our work on knowledgeblog [@url:knowledgeblog.org] we have been investigating ways of injecting the formal technical aspects of the scientific publication process into this form of publication. The reasons for this are myriad: if the scientist can control the form, they can innovate in their presentation how they choose; the publication process itself becomes very simple and straight-forward (as opposed to the authoring, which is as hard as it ever way). Finally, it means that scientists can publish as they go, as I have done and am doing on my work with Tawny-OWL.…

Open Access Response to HEFCE

HEFCE is currently asking for feedback on the role of Open Access in the next REF. While I have a a number of technical suggestions, I think that the biggest and best contribution that the next HEFCE could make to the next REF is to state pubically that all journal/conference/venue metadata be removed from papers before they are sent for review. It is time that we stopped judging books by their cover. It would be a fantastic contribution if HEFCE could take a lead on this. This is my full response. Expectations for Open Access I feel that one key issue is missing from this document. Scientists still have problems in some areas (including mine of computing science) in that the "high-impact" journals or conferences often provide no or prohibitively expensive open access options. In…

The evil a space can do

Recently, I was contacted by a Kcite [@url:knowledgeblog.org/kcite-plugin] user who had found an interesting problem. They had cut-and-paste a DOI from the American Society of Microbiology article [webcite], and then used this in a blog post. But it was not working. The user actually did identify the problem, which was a strange character in the DOI. So, I decided to investigate a bit futher. Looking at the source for the page, and the DOI appears mostly fine; it is not formatted according to CrossRef display guidelines [@url:www.crossref.org/02publishers/doi_display_guidelines.html] but they are hardly alone in this. <span class="slug-doi">10.1128/​AAC.01664-10 </span> However, looking a bit further into this at the binary of this source and we see this: 00006260: 2…

Why Metadata Must be Useful

Adding metadata to article could be done by many people. This could be the author, and in the ideal world, this would be the author. They know most about the content and are best placed to put the most knowledge into it. But, we have to answer the question, why would they do this? We have previously argued that semantic metadata must be useful to the people who producing it [@url:www.russet.org.uk/blog/2054] For this, we need tools that extract and consume this metadata. I discovered a nice example of this recently while reading an interesting paper from Yimei Zhu and Rob Proctor [@url:www.escholar.manchester.ac.uk/uk-ac-man-scw:187789] investigating how PhD students use various tools to communicate. I was interested in citing this paper. The paper can be found on the web at the Manchester…

Splitting a Mercurial Repository

The Mercurial repository for KnowledgeBlog [@url:knowledgeblog.org] has been starting to show the strain for a while now. Firstly, when it was created we were all new to mercurial; for instance it contains the trunk directory which is really a Subversion metaphor. The second problem is that it is a single large repository, which maps to the development directory on my hard drive; there is now a lot of experimental software on my hard drive which I don't want in a public enviroment, so I am now faced with either an enormous .hgignore or more "untracked" files than tracked. Not ideal. At the same time, I have more recently moved mostly toward using git; actually, I still think Mercurial is nicer than git; the interface to the commands is cleaner, and the functionality is not that d…

Is Peer Review the Future?

Today, I recieved an email from a journal, asking me if I would review a paper. The paper in question is by, amoung others, Iddo Friedberg, and can be read on arXiv [@url:arxiv.org/abs/1301.1740] I've known Iddo Friedberg for a while; he was an earlier user of my semantic similarity work [@url:dx.doi.org/10.1093/bioinformatics/btg153] for protein function prediction [@url:dx.doi.org/10.1110/ps.062158406] and was also the editor for our paper on realism in ontology development [@url:dx.doi.org/10.1371/journal.pone.0012258] I would have liked to review this paper, and I feel a little bad because I know these things are important for the careers of the scientists. So, why did I decline? Well, nice and simple; the page charges are just too high. There is no real justification for this as it ca…

Testing Times for Tawny

Tawny OWL, my library for building ontologies [@url:www.russet.org.uk/blog/2254] is now reaching a nice stage of maturity; it is possible to build ontologies, reason over them and so forth. We have already started to use the programmable nature of Tawny, trivially with disjoints [@url:www.russet.org.uk/blog/2275] as well as allowing the ontology developer to choose the identifiers that they use to interact with the concepts [@url:www.russet.org.uk/blog/2303] However, I wanted to explore further the usefulness of a programmatic environment. One standard facility present in most languages is a test harness, and Clojure is no exception in this regard. Tawny already comes with a set of predicates for testing superclasses, both asserting and inferred, which provides a good basis for unit testin…