An Exercise in Irrelevance - Programming OWL

I have been struggling for a while with OWL development environments. While Protege provides a nice GUI based system, this has the limitations of many such systems; it allows you to do what the authors intended, but not all of the things that you might wish.

It is partly for this reason that I have been developing my own OWL Manchester syntax mode for Emacs (n.d.a) I lose a lot from Protege, but then I also gain the ability to manipulate large numbers of classes at once, as well as easy access to versioning. These things are useful.

Still, the environment is lacking in many ways; recently, while building an ontology for karyotypes (n.d.b) I wanted a more programmatic environment. A trivial example, for instance, comes from the human chromosomes; there are 22 autosomes in all. These can easily be expressed in OWL with 22 classes (plus X and Y). The problem is that all of these classes are likely to be very similar, which produces a code duplication problem. Of course, this is not a new problem; OPPL — the ontology pre-processor language was created at least in part for this purpose (Aranguren, Stevens, and Antezana 2009)

The main problem with OPPL, however, is that is a Domain Specific Language; while this makes it well adapted to its task, it also means that it lacks many basic features of a “real” programming language. Another possibility is to use the OWL API (Bechhofer, Volz, and Lord 2003) (I am actually on this paper, but I publicly acknowledge that this was a rather generous attribution from Sean Bechhofer; I did do some work on the API, but not much, and I suspect none of my work remains). However, a brief look at the OWL API tutorial shows a problem. This code creates two classes and makes one a subclass of another.

OWLOntologyManager m = create();
OWLOntology o = m.createOntology(pizza_iri);
// class A and class B
OWLClass clsA = df.getOWLClass(IRI.create(pizza_iri + "#A"));
OWLClass clsB = df.getOWLClass(IRI.create(pizza_iri + "#B"));
// Now create the axiom
OWLAxiom axiom = df.getOWLSubClassOfAxiom(clsA, clsB);
// add the axiom to the ontology.
AddAxiom addAxiom = new AddAxiom(o, axiom);
// We now use the manager to apply the change
m.applyChange(addAxiom);
// remove the axiom from the ontology
RemoveAxiom removeAxiom = new RemoveAxiom(o, axiom);
m.applyChange(removeAxiom);

Aside from the intrinsic problems of Java — the compile, run, test cycle is rather clunky for this sort of work, this amount of code to achieve something straightforward makes this a little untenable.

Class: piz:A
    SubClassOf:
        piz:B

However, while Java and the OWL API do not seem a good choice for manipulating OWL directly, rewriting everything from first principles would also be a bad idea.

One solution to this problem came to my attention recently, in the shape of Clojure; essentially, this is a lisp implemented on the JVM. I will not describe the virtues or otherwise of Lisp in great detail; for some reason it is one of those languages that tends to generate fanaticism, and there are lots of descriptions of lisp elsewhere. For my purposes, there were three advantages. The first was personal, which is that I know Lisp reasonably well being an Emacs hacker. The other two are more general: Clojure has good integration with Java, and can manipulate Java objects, meaning I can make direct use of the OWL API; and, second, Lisp has a good degree of syntactic plasticity, which is important as, after all, I am looking for a convenient representation.

Initially, I have aimed at producing a representation which is fairly similar to Manchester syntax (n.d.c/) My initial attempts used the various features of Clojure directly. Consider, for instance, the following two statements:

(owl/owlclass
 "Arm" {:subclass "Limb"})

(owl/owlclass
 "HumanArm" {:subclass ["Limb" "HumanBodyPart"]})

Lisp, in general, uses a prefix notation. There is no obvious and easy way around this; in this case, it actually fits rather well with Manchester syntax which looks similar. The use of frame keywords such as :SubClassOf in Manchester syntax is also fortuitous as lisp uses a similar syntax. However, this syntax is rather too difficult. Even in this simple example we have a statement terminator which looks like ]}) (representing end of a vector, hash and sequence respectively). Lisp’s are often criticised for having too many parentheses; Clojure is unusual in using lots of different styles of parens. In Emacs-Lisp, I just keep hitting ) till I finished. In Clojure, you have to the brackets in the right order. All rather painful.

Fixing this turned out to be quite difficult, with a particularly nasty function I have called groupify. It is heavily recursive, which is apparently, a poor idea in Clojure, as it lacks some recursion optimisations present in many lisps; however, without mutable local variables, I could see no other option. The syntax now looks much simpler.

(owl/owlclass
 "Arm" :subclass "Limb")

(owl/owlclass
 "HumanArm" :subclass "Limb" "HumanBodyPart")

(owl/owlclass
 "Hand" :subclass (owl/some "isPartOf" "Arm"))

Both the :subclass and :equivalent frames support any number of class expressions; so far I have only implemented some or only, but the rest are not hard. Currently, it is only possible to save the ontology in Manchester syntax, but fixing this is trivial; the OWL API is doing all of the work.

Of course, this would not be much help if all I had managed to achieve was Manchester syntax with more parens. However, the big advantage of this becomes clearer with the next example:

(dorun
 (map
  (fn [x]
    (owl/owlclass
     (str "HumanChromosome" x)
     :subclass "HumanChromosome"))
  (concat '("X" "Y") (range 1 23))))

This creates a class for each human chromosome. In this case, I have hard coded the list of classes in, but I could be parsing a CSV or accessing a database. Or accessing an existing ontology; this could be very useful in avoiding maintenance of duplicate hierarchies.

Still, as it stands is just a (under-functional) version of OPPL. To make this worthwhile, I need to build off the language features that Clojure brings. I want to be able to interact with a reasoner, performing tasks in batch. In particular, the next step is to hook into Clojure’s test framework; something I have sorely missed when ontology building as opposed to programming. My experiences so far with combining Clojure and the OWL API suggest this should not be too hard.

These would not be minor advances; in the same way that test-driven programming has had a significant impact on the way we code, having a good test frame work for OWL would mean that we could define our use cases up-front, formally, programmatically and then fiddle with the logical representation till they work. As with test-driven programming, the test cases would themselves start to form part of the documentation for the code. When combined with a literate framework (n.d.d) to link between the ontology, the test cases and the experimental data that we are attempting to represent and model, this would provide a strong environment indeed. It would be a good step from moving from the craft-based approach we are taking at the moment, toward the pragmatic environment that I and others (n.d.e/) feel we need.

My code is available on Google code at http://code.google.com/p/clojure-owl/, and will be developed further there.

n.d.a. https://www.russet.org.uk/blog/2161.

———. n.d.c. https://www.w3.org/TR/owl2-manchester-syntax.

———. n.d.d. https://www.russet.org.uk/blog/1213.

———. n.d.e. https://robertdavidstevens.wordpress.com/2011/05/26/unicorns-in-my-ontology.

———. n.d.b. https://www.russet.org.uk/blog/2202.

Aranguren, Mikel Egaña, Robert Stevens, and Erick Antezana. 2009. “Transforming the Axiomisation of Ontologies: The Ontology Pre-Processor Language.” Nature Precedings, December. https://doi.org/10.1038/npre.2009.4006.1.

Bechhofer, Sean, Raphael Volz, and Phillip Lord. 2003. “Cooking the Semantic Web with the OWL API.” In Lecture Notes in Computer Science, 659–75. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-39718-2_42.