Archive for the ‘Clojure-owl’ Category

Remembering the World as it used to be

I have been working on a Clojure library for developing OWL ontologies . There have been two significant advances with this library recently. First, I have changed its name from clojure-owl to tawny-owl. I was never really happy with the original name; I think it is bad practice to name something after the language it uses (even partly, as the many jlibraries attest), and there was several other libraries around for manipulating OWL in clojure, albeit in different ways. “Tawny” is simple and straight-forward and memorable, I think. At the same time, I moved to Github because I can now just updated readme.md, rather than having to update a separate website.

Perhaps, more importantly, I have put in new code for handling change to external ontologies, which is particularly important for external libraries.

Throughout the development of tawny-owl, I have focused on provided an environment that is easy to use for the developer; so, classes, properties and other entities are represented as lisp symbols . This works well and produces very attractive looking code in, for example, my version of the Pizza ontology. I have also written code so that ontologies only available as OWL files can be treated as first-class citizens: very easy in a highly dynamic language like Lisp.

However, it causes problems when combined with an ontology such as OBI . The difficulty here is that OBI uses semantics-free identifiers . While there are some good reasons for this, would result in Clojure of the form:

 (defclass OBI:0034322 :subclass OBI:0034321)

Clearly, this is not good, and something that I want to avoid. So, instead, we apply a transform function to OBI when importing it; basically, this munges the rdfs:label annotation, turning it into something that is a legal Clojure symbol.

  :transform ;; fix the space problem (fn [e] (clojure.string/replace ;; with luck these will always be literals, so we can do this ;; although not true in general (.getLiteral ;; get the value of the annotation (.getValue (first ;; filter for annotations which are labels ;; is lazy, so doesn't eval all (filter #(.. % (getProperty) (isLabel)) ;; get the annotations (.getAnnotations e (owl.owl/get-current-jontology)))))) #"[ /]" "_" ))

All well and good. However, there is a problem. The label in OBI has two characteristics. First, it is human readable, which is good, and the reason why we are using it. Second, however, is does not carry formal semantics; the developers are free to change these labels when ever they like. Of course, any ontology that I build against by tawnyized version of OBI will break, because the label has changed. This is not a problem for a GUI like protege, because, perhaps ironically, GUIs are not WYSIWYG — what you see is actually a view of the underlying datamodel. So, protege shows you the label, but actually you are manipulating the URI. A dependency can change their labels, and when Protege reloads it, this is what the developer will see.

With code, on the other hand, there is no separation at all. If the label changes, I will have to update anything that refers to this, which seems a substantial problem. However, I have now managed to work around this. My new library memorise saves all the mappings into a file, then restores them when OBI is loaded. Any old labels that no longer exist but which point to an IRI that still does exist are generated as duplicate symbols pointing to the same OWL object; however, I have done this in a way that they will emit warnings both when loading, and during use, with a description of the new symbol name. This data would also make automatic upgrading possible, of course, using Clojure to perform a big search and replace on the source code. I think that this is a nicer solution than the denormalisation or “colour cube” solution that I previously suggested for Manchester syntax. It also shows off the advantage of using a programming language, rather than a static format; I, or any other user of the library, can just add this, as I choose, without having to wait for standardisation process, and tool support to catch up.

This will still leave a secondary problem; it is dependent on the IRI which for pre-release versions of OBI is not fixed, as documented. Of course, this problem could go away, if OBI used a tool like URIGen, or alternatively if OBI released more regularly. Still, the data should also allow a reverse lookup — finding out what IRI a label now has.

I think these are the main tools that are needed to build against an external resource. The 0.5 version of Tawny is now available on Clojars and Github.

Bibliography

Clojure OWL 0.2

I have been developing a library written in Clojure, that I can use for building OWL ontologies programmatically . The basic idea behind this library is to give me something that looks like Manchester syntax , but which is none the less fully programmatic; it can be extended arbitrarily, both for general use and for one-off, single ontology specific custom code.

This has already shown its worth: for example, adding a syntax for “some and only” closure axioms was straightforward; likewise, I can now express disjoints and subclasses implicitly through bracket placement, rather than through named concepts . Although in its early dates, I have added initial support for ontology design patterns — in this case a value partition, which I think will be extended significantly in future versions. I have used this in my version of the Pizza ontology which seemed as good a demonstrator to start with as any ; this also contains some custom “one-off” patterns, for building “named” pizzas. Taken together these where enough to constitute the first (unheralded) 0.1 of Clojure-OWL.

However for 0.2, I wanted one more feature that I think makes this now a usable alternative for developing ontologies. I wanted to be able to address ontologies that were built using other technologies, which were accessibly only as an OWL file. Of course, the library has always had the ability to build classes using URIs as strings; this facility means that it is possible to address another ontology. However, I wanted ontologies read from OWL files to be first-class citizens; classes and properties should be represented as lisp symbols , providing a degree of safety to the system — it is not possible to refer to a concept not previously defined, nor use a concept where a property is needed.

This turned out to be reasonably straightforward; Clojure-OWL now maps an individual ontology to a clojure namespace. Reading an ontology in from an OWL file is reasonably simple using the OWL API; finally, as a highly dynamic language, clojure can create new symbols on the fly with ease. To test this out, I needed a reasonably large and complex ontology: I choose OBI for reasons of familiarity.

The process of integrating it into Clojure-OWL starts to show the power of this approach. A basic outline of the code to achieve this is simple enough. It requires a location, prefix and an identifier. The location is generic, including a stream, so could be anything. I have include “obi.owl” as a class resource; I can use a URL, but accessing the network every time I wish to use things is a pain, although this would effective provide a form of continuous integration.

 (defread obi ;; something that the OWL API can interpret. This includes a stream, so ;; it's totally generic. :location (IRI/create (clojure.java.io/resource "obi.owl")) ;; the prefix that you want to use in this case :prefix "obo" ;; normally only things from this IRI will be imported :iri "http://purl.obolibrary.org/obo/" )

On its own though, OBI contains a large number of concepts from many different ontologies. Normally, I filter for only entities whose identifier starts with the IRI above. This fails with OBO ontologies which use a sort of namespacing mechanism and a numeric identifier. So I need to apply a custom filter.

  :filter (fn [e] (and (instance? OWLNamedObject e) (.startsWith (.toString (.getIRI e)) "http://purl.obolibrary.org/obo/OBI" )) )

I can think of many other uses for this sort of filtering; if I want to include a subset of entities then this would work also.

The next problem is OBIs use of semantic-free identifiers . Even if the reasons behind this decision are good, the resulting numeric atoms (OBI_0000107) are useless — I want to be able to say provides_service_consumer_with. So for this I use a custom transform function. This forms the name of the lisp symbol from the label instead, with a regexp fix to remove characters which are illegal — spaces for obvious reasons, and “/” which clojure uses as a namespace qualifier.

  :transform ;; fix the space problem (fn [e] (clojure.string/replace ;; with luck these will always be literals, so we can do this ;; although not true in general (.getLiteral ;; get the value of the annotation (.getValue (first ;; filter for annotations which are labels ;; is lazy, so doesn't eval all (filter #(.. % (getProperty) (isLabel)) ;; get the annotations (.getAnnotations e (owl.owl/get-current-jontology)))))) #"[ /]" "_" ))

The final addition is to add the ability to import an ontology into the current; without this, references to another ontology will share URIs, but not pull the referenced ontology with all its axioms into the current namespace. Without this, reasoning will not work as expected. This is achieved with a single form:

 (owlimport obi)

Unfortunately, I have had to disable Hermit functionality — our current mavenized version of HermiT is working, but failing a few tests from incompatibilities with the current OWL API. This will be re-enabled in the new version.

Taken together, I think, clojure-owl now represents a reasonable programmatic environment for OWL. We now have the tools we need to replicate the the essential functionality of a tool like Protege; not that I am trying to replace Protege, as I still use it as a viewer for my generated ontologies. But, more over, I can now extend this functionality. As well as importing an ontology, I can filter the import so that only certain entities are available — an ad-hoc form of privacy. In later versions, I will probably add more explicit support for this. We can now package an OWL ontology in a Jar and publish it to any Maven repository. You may love or hate maven (generally, the latter), but being able to resolve dependencies is a strong point, especially as it brings versioning with it.

Release 0.2 is now available on Clojars or Google Code.

Bibliography

Disjoints in Clojure-owl

When I started work on Clojure-owl the original intention was to provide myself with a more programmatic environment for writing ontologies, where I could work with a full programming language at to define the classes I wanted . After some initial work with functions taking strings, I have moved to an approach where classes (and other ontological entities), are each assigned to a Lisp symbol . I’m using “symbol” rather than “atom” because its a bit more accurate, especially as Clojure uses “atom” with a different meaning.

This means that I now have something which allows me to write ontological terms looking something like this:

 (defclass a) (defclass b :subclass a) (defoproperty r) (defclass d :subclass (some r b))

While this is quite nice, and looks fairly close to Manchester syntax , ultimately, so far all this really provides me with is a slightly complex mechanism for achieving what I could already do; which raises the questions, why not just use Manchester syntax? Why bother with the Lisp if this is all I am to achieve?

I think I have now got to the point where the advantages are starting to show through, as I have started to create useful macros, which operate at a slightly higher level of abstraction from Manchester syntax. I will explain this using examples, perhaps inevitably, based around pizza , which I have started to develop using Clojure-owl.

First I wanted to be able to define several classes at once, rather than having to use a somewhat long-winded defclass form for each; for this I have written a macro called declare-classes — perhaps a slight misnomer, as it also adds the classes to the ontology. This example shows the purpose:

  (declare-classes GoatsCheeseTopping GorgonzolaTopping MozzarellaTopping ParmesanTopping)

In practice, this may not be that useful for an ontology builder, as it creates a bare class; no documentation, nothing else. It may be useful for forward-declaration (like Clojure declare).

One slightly unfortunate consequence of the decision to use lisp symbols is I know find myself writing a lot of macros. For those who have not used lisp before, most work is done with functions. Macros are only necessary when you wish to extend the language itself. They tend to be more complex to write and to debug, although fortunately are easy to use. Compare, for example, the definition of declare-classes to that of the functional equivalent which uses strings.

 (defmacro declare-classes [& names] (do ~@(map (fn [x#] (defclass ~x#)) names))) (defun f-declare-classes [& names] (dorun (map #(owlclass x) names)))

Even in this case, there is more hieroglyphics in the macro — two backticks, one unquote splice and some gensym symbols although Clojure’s slightly irritating lazy sequences and the resultant dorun mean that the two are nearly as long as each other. I suspect that the macros are going to get more complex, however. In most cases, should not be the user of the library that has to cope though.

While this provided a useful convenience, I also wanted a cleaner method for declaring disjoints. Consider this example:

 (defclass a) (defclass b) (defclass c) (disjointclasses a b c)

This is reasonably effective, but a pain if there are many classes, as they all need to be listed in the disjointclasses list. Worse, this is error prone; it is all too easy to miss a single class out, particularly if a new classes is added. So, I have now implemented an as-disjoint macro which gives this code:

 (as-disjoints (defclass a) (defclass b) (defclass c))

This should avoid both the risk of dropping a disjoint, as well avoiding the duplication. An even more common from is to wish to declare a set of classes as disjoint children. Again, I have provided a macro for this, which looks like this:

  (defclass CheeseTopping) (as-disjoint-subclasses CheeseTopping (declare-classes GoatsCheeseTopping GorgonzolaTopping MozzarellaTopping ParmesanTopping))

Although this was not my original intention, these are actually nestable. This gives the interesting side effect that the ontology hierarchy is now represented in the structure of the lisp. Example below is an elided hierarchy from pizza. Lisp programmers will notice I have rather exaggerated the indentation to make the point.

 (as-disjoint-subclasses PizzaTopping (defclass CheeseTopping) (as-disjoint-subclasses CheeseTopping (declare-classes GoatsCheeseTopping)) (defclass FishTopping) (as-disjoint-subclasses FishTopping (declare-classes AnchoviesTopping)) (defclass FruitTopping) (as-disjoint-subclasses FruitTopping (declare-classes PineappleTopping)))

Of course, it is not essential to do this. The nested use of as-disjoint-subclasses confers no semantics; but it does allow juxtaposition of a class and it’s children.

Being able to build up macros in this way was the main reason I wanted a real programming language; those described here are, I think, fairly general purpose; so, this form of declaration could also be supported in any of the various syntaxes, although it would require update to the tools. However, some ontologies will benefit from less general purpose extensions. These are never going to be supported in syntax specification.

Still, it is not all advantage. Using a programming language means embedding within this language. And this means that some of names I would like to use are gone; http://clojuredocs.org/clojure_core/clojure.core/some [some] is the obvious example. While Clojure has good namespace support, functions in clojure.core are available in all other namespaces; like all lisps, Clojure lacks types which would have avoided the problem. There are other ways around this, but ultimately clashing with these names is likely to bring pain; for example, I could always explicitly reference clojure-owl functions; but writing owl.owl.defclass rather than defclass seems a poor option; hence, some has become owlsome, and comment has become owlcomment. I have decided to accept the lack of consistency and kept only and label; the alternative, taken by the OWL API to appending OWL to everything seems too unwieldy.

Bibliography

OWL Concepts as Lisp Atoms

With my initial work on developing a Clojure environment for OWL , I was focused on producing something similar to Manchester syntax . Here, I describe my latest extensions which makes more extensive use of Lisp atoms. The practical upshot of this should be to reduce errors due to spelling mistakes, as well as enabling me to add simple checks for correctness.

The desire for a simple syntax is an important one. I would like my library to be usable by people not experienced with Lisp, although I am clearly aware that this sort of environment is likely to be aimed at those with some programming skills. I have managed to produce a syntax which, I think, is reasonable straight forward. It has more parentheses than Manchester syntax, but is easier in other ways, especially now that I have learnt a little more about how Clojure namespaces work. For example, this defines a class in OWL.

 (owlclass "HumanArm" :subclass "Arm" (some "isPartOf" "Human") :annotation (comment "The Human arm is an Arm which is part of a human"))

One of my initial desires for the Clojure mode was to enable the use of standard tools that we have come to expect from a modern programming language, which should enable us to build a more pragmatic ontology building methodology . The first of these is a unit testing environment. Clojure already has one of these integrated. So far, I have only used this for testing my own code; so, for example, this is the current unit test for the owlclass function used above.

 (deftest owlclass (is (= 1 (do (o/owlclass "test") (.size (.getClassesInSignature (#'o/get-current-jontology)))))) (is (instance? org.semanticweb.owlapi.model.OWLClass (o/owlclass "test"))))

There are, however, some limitations to the approach that I have taken so far. Consider this statement:

 (owlclass "HumanArm" :subclass (some "isPartOf" "Humn") "Arm" )

This is broken because I have referred to the class Humn which I probably do not want to exist because I have spelt it wrongly. Unfortunately, as it stands my code does not know this and so will create the class “Humn”. Now, this form of error is not that likely to happen; tools such as Kudu enforce this correctness in the Editor, while pabbrev.el provides “correctness-by-completion”. None the less, these errors will happen and I do not want them to. There are a variety of ways that I could build this form of checking in — generally, this would involve introspecting over the ontology to see if classes already exist.

However, I have taken a different approach, so that I can use the Lisp itself to prevent the problem. To do this, for each class created, I generate a new Lisp symbol; likewise, object property and the ontology itself. The practical upshot of this, I that I can write code like so:

 (defclass a) (defclass b :subclass a) (defoproperty r) (defclass d :subclass (some r b)) ;; will fail as f does not exist (defclass e :subclass f) ;; will fail as r and b are the wrong way around (defclass e :subclass (some b r))

The advantages are three-fold. Firstly, it’s slightly shorter, and there is no need to use quotes all over the place. Secondly, it is no longer possible to refer to a class that has not yet been defined; Clojure will pick this up immediately; from the user perspective, you can test your statements as you go, as soon as you have written them, by evaluating them. Finally, because the atoms carry values which are typed, we can also detect errors such as using a property when a class is necessary.

Of course, the original functions are all still in place; there would be no point defining symbols if the intention was to use the API entirely programmatically. But, my intention for Clojure-OWL is to have environment for humans (well, programmers anyway) to develop ontologies with.

There is a final advantage to this, that I have not yet exploited. Currently, I have generated the name of the OWL class directly from the symbol name. So, in the above example the class a will have a name “a“. There are some problems with this. Not all characters are legal in Clojure symbol names nor in OWL class names, and the set of characters is not the same. So, while this is a useful default, I will formally separate these. At the same time, I think that this will allow me to address a second problem, that of semantics vs semantics free identifiers . I can call a class, ontology or object property anything at all, and refer to it with a easy to remember identifier. I might use something like this:

 (defoproperty has_part :name "BFO_OOOOO51")

The is still a significant amount of work to do yet; I haven’t made a complete coverage of OWL yet, just the most important parts (i.e. the bits that I use most often). Next, I need to start building some predicates so I can test (asserted) subclass relationships. So far, however, this approach is showing significant promise.

Bibliography

Programming OWL

I have been struggling for a while with OWL development environments. While Protege provides a nice GUI based system, this has the limitations of many such systems; it allows you to do what the authors intended, but not all of the things that you might wish.

It is partly for this reason that I have been developing my own OWL Manchester syntax mode for Emacs ; I lose a lot from Protege, but then I also gain the ability to manipulate large numbers of classes at once, as well as easy access to versioning. These things are useful.

Still, the environment is lacking in many ways; recently, while building an ontology for karyotypes , I wanted a more programmatic environment. A trivial example, for instance, comes from the human chromosomes; there are 22 autosomes in all. These can easily be expressed in OWL with 22 classes (plus X and Y). The problem is that all of these classes are likely to be very similar, which produces a code duplication problem. Of course, this is not a new problem; OPPL — the ontology pre-processor language was created at least in part for this purpose .

The main problem with OPPL, however, is that is a Domain Specific Language; while this makes it well adapted to its task, it also means that it lacks many basic features of a “real” programming language. Another possibility is to use the OWL API (I am actually on this paper, but I publicly acknowledge that this was a rather generous attribution from Sean Bechhofer; I did do some work on the API, but not much, and I suspect none of my work remains). However, a brief look at the OWL API tutorial shows a problem. This code creates two classes and makes one a subclass of another.

 OWLOntologyManager m = create(); OWLOntology o = m.createOntology(pizza_iri); // class A and class B OWLClass clsA = df.getOWLClass(IRI.create(pizza_iri + "#A")); OWLClass clsB = df.getOWLClass(IRI.create(pizza_iri + "#B")); // Now create the axiom OWLAxiom axiom = df.getOWLSubClassOfAxiom(clsA, clsB); // add the axiom to the ontology. AddAxiom addAxiom = new AddAxiom(o, axiom); // We now use the manager to apply the change m.applyChange(addAxiom); // remove the axiom from the ontology RemoveAxiom removeAxiom = new RemoveAxiom(o, axiom); m.applyChange(removeAxiom);

Aside from the intrinsic problems of Java — the compile, run, test cycle is rather clunky for this sort of work, this amount of code to achieve something straightforward makes this a little untenable.

 Class: piz:A SubClassOf: piz:B

However, while Java and the OWL API do not seem a good choice for manipulating OWL directly, rewriting everything from first principles would also be a bad idea.

One solution to this problem came to my attention recently, in the shape of Clojure; essentially, this is a lisp implemented on the JVM. I will not describe the virtues or otherwise of Lisp in great detail; for some reason it is one of those languages that tends to generate fanaticism, and there are lots of descriptions of lisp elsewhere. For my purposes, there were three advantages. The first was personal, which is that I know Lisp reasonably well being an Emacs hacker. The other two are more general: Clojure has good integration with Java, and can manipulate Java objects, meaning I can make direct use of the OWL API; and, second, Lisp has a good degree of syntactic plasticity, which is important as, after all, I am looking for a convenient representation.

Initially, I have aimed at producing a representation which is fairly similar to Manchester syntax . My initial attempts used the various features of Clojure directly. Consider, for instance, the following two statements:

 (owl/owlclass "Arm" {:subclass "Limb"}) (owl/owlclass "HumanArm" {:subclass ["Limb" "HumanBodyPart"]})

Lisp, in general, uses a prefix notation. There is no obvious and easy way around this; in this case, it actually fits rather well with Manchester syntax which looks similar. The use of frame keywords such as :SubClassOf in Manchester syntax is also fortuitous as lisp uses a similar syntax. However, this syntax is rather too difficult. Even in this simple example we have a statement terminator which looks like ]}) (representing end of a vector, hash and sequence respectively). Lisp’s are often criticised for having too many parentheses; Clojure is unusual in using lots of different styles of parens. In Emacs-Lisp, I just keep hitting ) till I finished. In Clojure, you have to the brackets in the right order. All rather painful.

Fixing this turned out to be quite difficult, with a particularly nasty function I have called groupify. It is heavily recursive, which is apparently, a poor idea in Clojure, as it lacks some recursion optimisations present in many lisps; however, without mutable local variables, I could see no other option. The syntax now looks much simpler.

 (owl/owlclass "Arm" :subclass "Limb") (owl/owlclass "HumanArm" :subclass "Limb" "HumanBodyPart") (owl/owlclass "Hand" :subclass (owl/some "isPartOf" "Arm"))

Both the :subclass and :equivalent frames support any number of class expressions; so far I have only implemented some or only, but the rest are not hard. Currently, it is only possible to save the ontology in Manchester syntax, but fixing this is trivial; the OWL API is doing all of the work.

Of course, this would not be much help if all I had managed to achieve was Manchester syntax with more parens. However, the big advantage of this becomes clearer with the next example:

 (dorun (map (fn [x] (owl/owlclass (str "HumanChromosome" x) :subclass "HumanChromosome")) (concat '("X" "Y") (range 1 23))))

This creates a class for each human chromosome. In this case, I have hard coded the list of classes in, but I could be parsing a CSV or accessing a database. Or accessing an existing ontology; this could be very useful in avoiding maintenance of duplicate hierarchies.

Still, as it stands is just a (under-functional) version of OPPL. To make this worthwhile, I need to build off the language features that Clojure brings. I want to be able to interact with a reasoner, performing tasks in batch. In particular, the next step is to hook into Clojure’s test framework; something I have sorely missed when ontology building as opposed to programming. My experiences so far with combining Clojure and the OWL API suggest this should not be too hard.

These would not be minor advances; in the same way that test-driven programming has had a significant impact on the way we code, having a good test frame work for OWL would mean that we could define our use cases up-front, formally, programmatically and then fiddle with the logical representation till they work. As with test-driven programming, the test cases would themselves start to form part of the documentation for the code. When combined with a literate framework , to link between the ontology, the test cases and the experimental data that we are attempting to represent and model, this would provide a strong environment indeed. It would be a good step from moving from the craft-based approach we are taking at the moment, toward the pragmatic environment that I and others feel we need.

My code is available on Google code at http://code.google.com/p/clojure-owl/, and will be developed further there.