Remembering the World as it used to be

2013-01-11

I have been working on a Clojure library for developing OWL ontologies [@url:www.russet.org.uk/blog/2214] There have been two significant advances with this library recently. First, I have changed its name from clojure-owl to tawny-owl. I was never really happy with the original name; I think it is bad practice to name something after the language it uses (even partly, as the many jlibraries attest), and there was several other libraries around for manipulating OWL in clojure, albeit in different ways. "Tawny" is simple and straight-forward and memorable, I think. At the same time, I moved to Github because I can now just updated readme.md, rather than having to update a separate website.

Perhaps, more importantly, I have put in new code for handling change to external ontologies, which is particularly important for external libraries.

Throughout the development of tawny-owl, I have focused on provided an environment that is easy to use for the developer; so, classes, properties and other entities are represented as lisp symbols [@url:www.russet.org.uk/blog/2254] This works well and produces very attractive looking code in, for example, my version of the Pizza ontology. I have also written code so that ontologies only available as OWL files can be treated as first-class citizens: very easy in a highly dynamic language like Lisp.

However, it causes problems when combined with an ontology such as OBI [@url:dx.doi.org/10.1186/2041-1480-1-S1-S7] The difficulty here is that OBI uses semantics-free identifiers [@url:www.russet.org.uk/blog/2040] While there are some good reasons for this, would result in Clojure of the form:

(defclass OBI:0034322
     :subclass OBI:0034321)

Clearly, this is not good, and something that I want to avoid. So, instead, we apply a transform function to OBI when importing it; basically, this munges the rdfs:label annotation, turning it into something that is a legal Clojure symbol.

  :transform
  ;; fix the space problem
  (fn [e]
    (clojure.string/replace
     ;; with luck these will always be literals, so we can do this
     ;; although not true in general
     (.getLiteral
      ;; get the value of the annotation
      (.getValue
       (first
        ;; filter for annotations which are labels
        ;; is lazy, so doesn't eval all
        (filter
         #(.. % (getProperty) (isLabel))
         ;; get the annotations
         (.getAnnotations e
                          (owl.owl/get-current-jontology))))))
     #"[ /]" "_"
     ))

All well and good. However, there is a problem. The label in OBI has two characteristics. First, it is human readable, which is good, and the reason why we are using it. Second, however, is does not carry formal semantics; the developers are free to change these labels when ever they like. Of course, any ontology that I build against by tawnyized version of OBI will break, because the label has changed. This is not a problem for a GUI like protege, because, perhaps ironically, GUIs are not WYSIWYG --- what you see is actually a view of the underlying datamodel. So, protege shows you the label, but actually you are manipulating the URI. A dependency can change their labels, and when Protege reloads it, this is what the developer will see.

With code, on the other hand, there is no separation at all. If the label changes, I will have to update anything that refers to this, which seems a substantial problem. However, I have now managed to work around this. My new library memorise saves all the mappings into a file, then restores them when OBI is loaded. Any old labels that no longer exist but which point to an IRI that still does exist are generated as duplicate symbols pointing to the same OWL object; however, I have done this in a way that they will emit warnings both when loading, and during use, with a description of the new symbol name. This data would also make automatic upgrading possible, of course, using Clojure to perform a big search and replace on the source code. I think that this is a nicer solution than the denormalisation [@url:www.russet.org.uk/blog/1470] or "colour cube" solution [@url:www.russet.org.uk/blog/2040] that I previously suggested for Manchester syntax. It also shows off the advantage of using a programming language, rather than a static format; I, or any other user of the library, can just add this, as I choose, without having to wait for standardisation process, and tool support to catch up.

This will still leave a secondary problem; it is dependent on the IRI which for pre-release versions of OBI is not fixed, as documented. Of course, this problem could go away, if OBI used a tool like URIGen, or alternatively if OBI released more regularly. Still, the data should also allow a reverse lookup --- finding out what IRI a label now has.

I think these are the main tools that are needed to build against an external resource. The 0.5 version of Tawny is now available on Clojars and Github.