Archive for the ‘Tech’ Category

I first discovered the Visitors pattern from reading the original design patterns book, sometime in the last century when, having jumped into computer science, I was busy hoovering up everything I could about programming. After reading the chapter on visitors, it made no sense to me then and I had no real cause to go near them again until I started working on Tawny-OWL (http://www.russet.org.uk/blog/2989). This library includes a renderer which takes a set of OWL API OWLObjects and turns them into Clojure data structures. Originally, I added this simply to generate documentation, but since then it’s grown to have rather a life of its own. In Java, this would be be implemented with a visitor. In Clojure, I implemented this with multimethods which is often used as a replacement for the visitor pattern but I am now far from convinced that this was a good decision, after the performance of my code collapsed massively. In this article, I describe the problems with multimethods, and some alternatives. For each of these, I use a small example, and bench mark them; you can follow my code on https://github.com/phillord/visitor-benchmark. The results are, perhaps, a little surprising. This should be of relevance to anyone addressing the same problem in Clojure.


Visitors and the OWL API

The purpose of a Visitor is to crawl over a set of objects, identified by their type, typically representing an abstract syntax tree, although it can be used for other purposes. It enables the implementation of many different types of behaviour over the tree without having to extend each of the types. This works by having a Visitor which implements one method for each object type. It works well, unless you add a new type, in which case there are a lot of Visitors to re-write. This is one reason that they are found mainly around abstract syntax trees. They tend not to change very much which is a prerequesite for visitors.

The OWL API (1303.0213) is a library which represents the OWL language (http://www.w3.org/TR/owl2-overview/) — a mechanism for representing an area of knowledge using logical statements, or an ontology (http://ontogenesis.knowledgeblog.org/66). It’s a fairly complex API and implements several visitor patterns; it’s main visitor — the OWLObjectVisitor has so many methods I ended up using Clojure and reflection to count them all — 98 is the answer!


Tawny-OWL

Now, Tawny-OWL (http://www.russet.org.uk/blog/2989) wraps the OWL API to provide an evaluative, R-like environment for building ontologies, built using Clojure, which provides the ability to abstract away from the actual axioms; you can build functions which build axioms, examples of which can be seen in the karyotype ontology (https://github.com/jaydchan/tawny-karyotype) (1305.3758).

While abstractions are generally nice, it also hides the underlying axioms; so I built tawny.render which takes OWL API objects and turns them into Clojure forms, originally as a mechanism for documentation, although since I wrote it, it has taken on a life of its own.

In Java, the renderer would just have used the Visitor pattern. In Tawny-OWL, I initially implemented this using multimethods. These have quite commonly been put forward as a replacement for the visitor pattern (http://www.fatvat.co.uk/2009/01/multi-methods-in-clojure.html), so it seemed a reasonable suggestion and it worked well for me, until a new version of the OWL API was released. Suddenly the performance of my test suite crashed catastrophically. Consider, these results: from 40 seconds to 4 minutes.

git clone git@github.com:phillord/tawny-owl.git
git checkout 5870a9

time lein with-profile 3.4.10 test
real    0m45.186s

time lein with-profile 3.5.0 test
real    4m26.070s

The OWL API test suites speed up slightly between the two versions, so it seemed unlikely to be an implementation mistake. I also noticed that most of the time was in the rendering tests — which include one substantial rendering task.

Thanks to git bisect, I tracked the problem down to a single commit 2bb6a0. If you look through the entire log (it’s long, and I did!) the surprising thing is this; there are precisely no implementation changes. All that this commit does is add a (large number) of interfaces to the existing model. Why on earth would this cause a massive drop in performance?


Visitors Revisited

There is one subtlety about the visitor pattern that is not obvious at first sight. If we have a visitor interface like so:

class Visitor{
    public void visit(A a);
    public void visit(B b);
    public void visit(C c);
    public void visit(D d);
}

At some point, we have to decide which method to actually call. With the visitor pattern it is the entity that accepts the visitor that makes this decision, because it is this entity that calls visit. Consider this code:

class Dimpl implements D{
    public accept( Visitor v ){
        v.visit(this);
    }
}

In Java, which method to dispatch to is made on the basis of the compile time, static type of the object. Now, the Dimpl class knows it’s own type at compile time, hence we dispatch to the correct Visitor method, even though the class which makes calls the accept method does not know the underlying type of the Dimpl object. Effectively, we pass off part of the computation to the compile time type system.

Clojure cannot do this. Multimethod calls have to check the runtime type of the object to be visited at runtime against the range of possibilities. By introducing a large number of interfaces, it appears the OWL API had increased the search space over which the multimethod isa? call had to crawl. I’ll try and show here that between the two versions of the OWL API, benchmarking isa? call has got significantly slower. This is not enough to explain in performance drop of my tests, although refactoring the multimethod to reduce the number of isa? does actually address the issue, so it really does seem to be the problem.

In the rest of this article, I will consider several different alternatives, benchmarking them and comparing them for performance. Microbenchmarking of this form is hard to do. In this article, I use a combination of the excellent criterium, clojure.core/time supplemented with some macro-benchmarking — running the Tawny-OWL tests. I consider that none of these tests are definitive, and will discuss their advantages and disadvantages as I go. All results are reported to 2 significant figures. Ultimately, my aim with this article is to improve the Tawny-OWL performance, but I hope that the results are clean enough that they are relevant to other environments also.


Set up

The rest of the code for this post is available at https://github.com/phillord/visitor-benchmark. The code in this article is a point in time, and will not be updated if I change the visitor benchmark. This project can be cloned and tested using lein with-profile production run. I will not describe all of the plumbing that makes this work here.

First, however, we define a benchmark command. Using Criterium (http:///github.com/hugoduncan/criterium) directly is a little painful, so I define a small macro which allows me to switch between bench, quick-bench and time. My version of bench also prints out the form so I can see what is happening.

(def bench-form
  ;;'clojure.core/time
  ;;'criterium.core/quick-bench
  'criterium.core/bench
  )

(defmacro bench [form]
  `(do
     (println "Benching:" (quote ~form))
     (~bench-form ~form)))

Establishing a base line

First we establish a base line; how long does it take to invoke a function that does nothing at all. While it varies from machine to machine, on my desktop it takes around 10ns.

(defn f [])

(visitor.benchmark.bench/bench (f)

Some objects to work with

I have choosen to use the OWL API and Tawny-OWL to generate my objects, although my belief is that this analysis should be good for any API. I already had a lot of the code for this analysis, the OWL API uses visitors and it is big. Finally, the change between 3.4.10 and 3.5.0 that I describe earlier is a useful test. We generate three objects to use.

(defontology ont)

(defclass AClass)
(defoproperty AProperty)
(def ASome (first  (owl-some AProperty AClass)))

Multimethods

To test multimethods first, I used the following code, which creates a two method multimethod operating on class. After this, I also tried adding many more methods to the multimethod and calling it again.

(defmulti multi class)

(defmethod multi org.semanticweb.owlapi.model.OWLClass [e])
(defmethod multi org.semanticweb.owlapi.model.OWLObjectProperty [e])

(defn multi-on-class []
  (bench (multi AClass)))

(defn multi-on-property []
  (bench (multi AProperty)))

Now using criterium.core/bench gives the following results, both before and after adding extra methods. The basic conclusion here is that, it makes no difference at all: the multimethod is unaffected by the change in number of method and, critically, there is no difference at all between OWL-API 3.4.10 and 3.5.0. Poor evidence for the theory that the multimethod is responsible for the performance drop.

OWL API
Class
Property
Big Class
Big Property
3.4.10
61ns
61ns
66ns
65ns
3.5.0
62ns
57ns
60ns
63ns

Poking through the Clojure java code finds this possible issue — the method lookup appears to be cached. dispatchVal here is the value returned by the dispatch function.

private IFn findAndCacheBestMethod(Object dispatchVal)

Testing with clojure.core/time rather confirms this. Consider the results in this table. The first thing to notice is that change in unit — we have moved from nanoseconds to milliseconds. Just for comparision our base test of an empty function reports “Elapsed time: 0.026253 msecs” or around 26μs for a simple invocation — three orders slower than criterium suggests and 5 orders faster than our multimethod invocation. Now, none of these differences are entirely believable, but what is interesting is difference of between 5 and 20 fold between the two versions of the OWL API. Of course, clojure.core/time is potentially risky because of start up time and so forth but, because this should affect the test against 3.4.10 and 3.5.0 equally; it also will also remove the effect of caching in findAndCacheBestMethod single it only runs once.

OWL API
Class
Property
Big Class
Big Property
3.4.10
220ms
140ms
80ms
88ms
3.5.0
860ms
1400ms
450ms
1400ms

Trying to pull apart this appart a little further, let’s test the isa? method more directly, which we test with following benchmarks one of which succeeds and one of which fails.

(bench (isa? (class AClass)
              org.semanticweb.owlapi.model.OWLClass))
(bench (isa? (class AClass)
              org.semanticweb.owlapi.model.OWLObjectProperty)))

First, comparison with criterium.core/bench which shows a massive difference between success and failure (note the change in unit); the reason for this should be obvious — if isa? succeeds it short-circuits; if it fails, it must test all superclasses and the various hierarchies that Clojure adds.

OWL API
Succeeds
Fails
3.4.10
76ns
40μs
3.5.0
76ns
33μs

Using the simpler, clojure.core/time shows the same thing as well as a difference between 3.4.10 and 3.5.0.

OWL API
Succeeds
Fails
3.4.10
30μs
1.4ms
3.5.0
40μs
3.8ms

This distinction is important here. If it fails, then the isa? method is really very, very slow. 3.5.0 is slower, although not that much slower, at least according to clojure.core/time.

I am still not entirely satisified with these results. They do appear to show that multimethods can be pretty slow, and they do show a difference between 3.4.10 and 3.5.0. However, the various caching that multimethods perform should help — even though the OWL API has add many new interfaces, we are still only talking about tens of interfaces.


Implementing a Visitor with a Visitor

The most obvious way to implement a visitor pattern is to, er, implement the visitor. To achieve this in a vaguely friendly way in Clojure, we need to use reify — we cannot use proxy as it cannot overload on type which is the whole point of the visitor pattern.

We can achieve this simply with the following code:

(defn ontology-visitor []
  (reify org.semanticweb.owlapi.model.OWLObjectVisitor
    (^void visit [this
                  ^org.semanticweb.owlapi.model.OWLOntology o]
      (println "v ontology" o))))

(defontology ont)
(.visit (ontology-visitor) ont)

This works fine, but there is a problem. By default, the visitor pattern returns nothing — it works by side effect. In general, and specifically in the OWL API the visit method returns void. This is not at all useful as a replacement for multimethods in a recursive descent renderer. To work around this problem, I add some extra glue in terms of an atom which we change by side effect during visiting and then dereference to provide a return value.

(defn visit-entity [entity]
  (let [r (atom nil)]
    (.visit
     (reify org.semanticweb.owlapi.model.OWLObjectVisitor
       (^void visit [this
                     ^org.semanticweb.owlapi.model.OWLOntology o]
         (reset! r
                 (str "visit o" o)))

       (^void visit [this
                     ^org.semanticweb.owlapi.model.OWLClass c]
         (reset! r
                 (str "visit class" c))))
     entity)
    @r))

(visit-entity ont)

(defclass A)
(visit-entity A)

And how does it perform. Well, badly, is the answer: around 25μs for each method invocation (using bench) which is 1000x slower than our base line.

Of course, one obvious complaint with this implementation is that it involves creation of an atom, resetting that atom and so forth; actually, I have tested that as well, but for the purpose of this article, I test this simpler version.

(defn new-object-visitor []
  (reify org.semanticweb.owlapi.model.OWLObjectVisitor
    (^void visit [this
                  ^org.semanticweb.owlapi.model.OWLObjectProperty o])
    (^void visit [this
                  ^org.semanticweb.owlapi.model.OWLClass c])))

This gives a base line score of about 7ns. In short, the creation of a new object is not the problem. In the examples above, I use type hinting, but actually this makes no difference to performance. The fundamental problem here is that our reified object is dispatching on type using runtime checks and reflection.


A big Cond

Another possible option would be to replace the multimethod with a big cond form. I will not show the full code for this as it’s rather long, but an example of part of it looks like this:

(defn- as-form-int
  "Render one of the main OWLEntities as one of the main functions
in tawny."
  [c options]
  (condp
      (fn [entity-type ^OWLEntity entity]
        (= entity-type (.getEntityType entity)))
      c
    EntityType/CLASS
    (as-form-int-OWLClass c options)
    EntityType/OBJECT_PROPERTY
    (as-form-int-OWLObjectProperty c options)
    EntityType/DATA_PROPERTY
    (as-form-int-OWLDataProperty c options)
    EntityType/ANNOTATION_PROPERTY
    (as-form-int-OWLAnnotationProperty c options)
    EntityType/NAMED_INDIVIDUAL
    (as-form-int-OWLNamedIndividual c options)
    EntityType/DATATYPE
    (as-form-int-OWLDatatype c options)))

Here, we have use a cond to dispatch on the EntityType. The individual methods for the multimethod have been renamed using a rather clunky naming scheme. I do not have individual benchmarks for form of test, but looked at overall it does solve the problem. lein test runs in around 10s and there is no difference between the OWL API 3.4.10 and 3.5.0. In short, it solves the problem.

Unfortunately, the solution comes at a significant cost. Firstly and perhaps most trivially the code looks terrible. The solution also is very specific to the OWL API — without the getEntityType method this fails. In fact, getEntityType does not make all the distinctions we need; I also needed another couple of methods (getClassExpressionType and getDataRangeType) and still needed to fallback to instance? at points.

A different solution would be to just use instance? calls.

(defn small [e]
  (cond
   (instance? org.semanticweb.owlapi.model.OWLClass e)
   true
   (instance? org.semanticweb.owlapi.model.OWLObjectProperty e)
   true))

This behaves as expected under benchmarking. The code shown above with only two instance of calls bench marks at 5ns; however, when expanded to perform all the instance of calls I need, performance drops to 230ns.

I have also tried a few other tricks for optimising type based look up, including replacing the cond with a map lookup. Details are available in my test code. Suffice to say, I did not achieve a solution which was performant.


Protocols

A final solution would be to use protocols. The idea here is that OWL Objects would know how to render themselves. I have not yet tested this within the context of the OWL API, and have only benchmarks. Consider this code:

(extend-type org.semanticweb.owlapi.model.OWLClass
  Render
  (render[this]))

A bench check of this shows highly performant invocation at around 10ns. The use of time is less happy, however, with a 0.2ms invocation (compared to 0.005ms baseline). I suspect that there is some caching happening at the protocol layer also. I would like to find the time to check protocols further, with a full implementation in the context of tawny.render. Unfortunately, this is quite a bit of (very boring!) code to rewrite.


Adapting the multimethod

The actual solution that I used was different to all of these, simply because it was the first acceptable solution that I found which solved the problem of very long test run. My approach to solving this problem was to address the isa? calls through memoization, using the fact that I know which classes the multimethod dispatches on.

The code for this just affects the dispatch method; we call isa? once and only once for each concrete class of object that we have. The full diff is available here.

(defn lookup
  "Returns the class of the OWLEntity of which this is a subclass."
  [c]
  (first
   (filter
    (fn fn-lookup [parent]
      (isa? c parent))
    (list org.semanticweb.owlapi.model.OWLClass
          org.semanticweb.owlapi.model.OWLObjectProperty
          org.semanticweb.owlapi.model.OWLNamedIndividual
          org.semanticweb.owlapi.model.OWLDataProperty
          org.semanticweb.owlapi.model.OWLAnnotationProperty
          org.semanticweb.owlapi.model.OWLDatatype))))

(def m-lookup (memoize lookup))

(defmulti multi (fn [c]
                  (m-lookup
                   (class c))))

My results here are a little surprising. The speed of the dispatch seems to slow down using my benchmarking (data not shown). Fortunately, I tried running my test cases BEFORE I did this bench marking, because the results overall are excellent.

git  checkout 2fab56

time lein with-profile 3.4.10 test
real    0m7.875s
user    0m9.547s
sys     0m0.661s

time lein with-profile 3.5.0 test
real    0m7.950s
user    0m9.641s
sys     0m0.696s

As you can see, this memoized version achieves two things. Firstly, it is faster than the original by five fold. But critically, the difference between 3.4.10 and 3.5.0 has gone. For the 3.5.0 version of the OWL API this memoization is 30 fold faster.

Of course, this solution is not ideal. The multimethod is not really doing anything now. We have closed it rather than left it open. It is doing a completely pointless isa? check when all we need is an = check, and it is still doing linear time dispatch when it should be constant. Moreover, I wonder what why this form of memoization is working when the multimethod implementation itself is cached. None the less it works well enough that I am happy with it for the moment.


Conclusions

In this article, I have investigated the visitor pattern and its implementation in Clojure. Multimethods do work, and do produce a nice code base, but they do not perform that well. As my example shows, they are potentially also sensitive to changes in the class structure of dependencies. In short, the runtime decisions of multimethods are not really a good replacement for the compile decisions of the Visitor.

Benchmarking is very difficult under the best of circumstances. Criteriums use of multiple runs brings some statistics to bear on the task, but also produces very poor results for memoized or cached functions if the real-world task defeats this caching. In this article, I have chosen a combination of simple test cases, bench and time but also my test cases from tawny.owl, against two different versions of the OWL API. Clearly the latter is not simple and there are many other variables which could be producing these results. However, what these tests lack in cleanliness, they gain in that they are a real world problem. Although I have tested them in this environment as well as I am able, this lack of cleanliness leaves the possibility that I have just misunderstood the cause.

I would like to test protocols in this environment also. They may be a good solution and appear to run quickly, but I have only tested against benchmark; as we have seen benchmarks are not necessarily clean.

Ultimately, I feel that we need proper “dispatch on type” support in Clojure. Multimethods are much more generic even if they are generally used in this way. I would like to be able to dispatch on more than one argument, have type hints taken care of for me, and most importantly, I would like them to be highly performant. I have managed to achieve some of this by modifying the dispatch method, but I think that this is a more generic problem and should be solved properly. As it stands, I would argue that multimethods are not a good replacement for the visitor pattern.


Acknowledgements

Thanks to Keith Flanagan for a proof read. Mistakes are my own!

Bibliography

I’ve noted in the past some of the strange beliefs about DOIs (http://www.russet.org.uk/blog/1849). One of these is that DOIs provide some magic archiving capability (http://www.russet.org.uk/blog/2360). The other is the strange one that “DOIs make things citable”. This was one of the selling points for Figshare, for instance.

I’m interested to see that now GitHub have now joined the party (http://github.com/blog/1840-improving-github-for-science), and again using the justification that “DOIs make things citable”. I am lost in attempting to understand this.

First, GitHub have stable URIs for repositories. It’s in their business interests to keep these and if they change them they will break every single repository that has checked things out using the URI.

Second, if I have a github URI I actually know that I have a link to a repository, and it is fairly clear that I can clone from this repository. With a DOIs I do not. Paper, datacite item, git repo, it is not possible to tell.

Third, with a github URI I have a URI that I compare against other URIs and work out whether it is the same or different. If I have a DOI, I now have two identifiers, the DOI and the URI both of which identify the same thing. Surely, this makes the situation worse, and not better.

Am I being a little cynical in wondering why some publishers require them? Do they, perhaps have a vested interest in making things more invouluted and not just using standard web technology (http://www.russet.org.uk/blog/2248)?

It seems to me like a clear case of DOIs are magical fairy dust. We sprinkle them on a github repository and now it is better, when actually we have made the situation worse.

The only justification that we have is “DOIs make it citable”. Is there a better one? Answers on a post-card please.


Update

I totally missed a post by Carl Boettiger which makes some of the same points (http://www.carlboettiger.info/2013/06/03/DOI-citable.html).

On the general issue of metadata, a DOI will give some harvestable metadata from the DOI, although Greycite can give much of the same metadata direct from GitHub (see for instance here). Having GitHub fix their metadata would seem to me to have been an easier win. And, of course, github URIs can be used to clone from and extract all the repository metadata using, well, git.

Bibliography

I’ve been programming in Clojure for well over a year now; originally, I heard about it care of Sam Aaron, an old PhD student of ours who gave a fun lunch time talk; I rarely go to these (although I probably should). Indirectly, Tawny-OWL came out of this one, so it is good that I went.

During the time that I have used Clojure, I have come to know it fairly well, and appreciate many of it’s finer points; these are not the same as many people, I realise; for me, the Java integration is simple, effective and very important, as Tawny-OWL is essentially a wrapper over a Java library. Meanwhile, a lot of the nice concurrency features are a matter of indifference to me, again for the same reason.

But like any language there are some problems, or at least thing that don’t work for me. On the off-chance it is useful to anyone else, here is my list of Clojure gotchas.


Lazy Lists

This is quite a common one, of course, which hits most Clojure beginners. We write something like:

(def x (map (fn [x] (println "hello") x) (range 1 100)))

and then wonder why nothing prints out. Or, the alternative problem, write something like (range) and find the REPL hangs. The latter is, I think, a poorly performing REPL; infinity might be more principled a point at which to stop than an arbitrarily choosen value but it’s not useful.

Of course, once you have got past this point, it’s not so bad, but laziness can still take you unawares, especially when I was using Clojure just to drive a JVM library. This subtle bug from tawny.render which is, essentially, one big recursive descent renderer, demonstrates the problem. Consider, this code:

(concat
  [:fact]
  (form [:fact fact])
  (form [:factnot factnot]))

Looks fine, but I need to pass options and a lookup cache around and had done this with a number of dynamic vars. The cache, it turns out, would not have been working for this form (although it was for others), but I never noticed this. However, the options broke the code more cleanly. concat is, of course, lazy, and was happening outside the binding form which defines the dynamic vars.

Now, I know dynamic vars and laziness don’t mix. In the end, I have added an additional parameter to all the functions in my renderer using the awesome power of lisp (i.e. I wrote a dodgy search and replace function in Emacs). And the cache now invalidates itself using a better technique than before. But I didn’t want laziness, I just got it by chance. In Clojure, it’s always there, wanted or not. Or, rather, it’s always sometimes there, because Clojure is only partly lazy.


Lisp-1 vs Lisp-2

Well, this argument is as old as the hills. Clojure is a lisp-1, so it has a single namespace for variables and functions, while Common Lisp and Emacs-Lisp are a lisp-2, so have one namespace for each.

I’ve had fun with single namespaces before — I used to teach Javascript to new programmers and it produces wierd and wonderful bugs that can be hard to track down. Still, I am too old and wize for that. If only!

During Tawny-OWL, I found accidental capture of functions produced some strange artifacts. Consider, for example, this code.

(defn my-get[x map]
   (get x map))

Everything works fine here, of course, right up till the point that you get bored of typing map and change it to m:

(defn my-get[x m]
   (get x map))

Now things break in strange ways. map is now the (global) function and not the parameter. There are many ways around this, of course. I could not have done (use 'clojure.core) earlier and just imported the functions I use; except that I did use map elsewhere. I could namespace everything (try and find some examples of Clojure code with namespace qualfied or aliased clojure.core functions).

In my case, exactly this problem hit me when I renamed parameters called ontology to o. I thought the compiler would pick up my errors but no, because I had an ontology function. This situation is made worse by my next gotcha which is:


Everything is a function

Consider this entirely pointless piece of code which makes lisp post-fix.

(defmacro do-do [x afn]
    `(do ~(afn x)))

We can use this macro like so:

(do-do 1 inc)

Now, if you know only a little about lisp, you might expect this to return 2. If you are more experienced, then you might think that this is a strange thing to do, because the call to (inc 1) happens at macro-expansion time, and why would you want to do that? If you are more experienced still, you will think, well actually inc is not evaluated so it is actually a symbol, and the whole thing is going to crash.

Actually, it returns nil. The reason for this is that lots of things in Clojure are functions that you wouldn’t expect, and symbol is one of these. So, actually, ('inc 1) returns nil. Because symbols are functions which lookup the occurance of the symbol in the collection that follows.

Now this has advantages, of course, namely that you can use a symbol to look up a key in a collection. So, for example:

('bob {'bob 1})

Returns 1. Of course, this is nice, but how many times do you actually want to do this? And when you do, would (get {'bob 1} 'bob) really be so hard? I can see the justification for (:bob {:bob 1}) but for symbols I am really not convinced, unless I am missing some other critical advantage.


Future, what’s a Future

So, your working along happily in your single threaded application, and then you write this:

(def x 1)
(def y (ref 2))

(+ @x y)

Now, in this small example, the error is easy to find; we should have derefed y and not x. And what is the error that we get from this?

ClassCastException java.lang.Long cannot be cast to
    java.util.concurrent.Future
    clojure.core/deref-future (core.clj:2108).

But I have not used a future. I have never used a future. I do not even know what a future is (although, I may, of course do so in the future). The reason for this strange error message can be seen from the code for deref (which the @ reader macro uses. Since, integers do not implement IDeref we treat them as a Future, which then causes the cast exception.

(defn deref
  {:added "1.0"
   :static true}
  ([ref] (if (instance? clojure.lang.IDeref ref)
           (.deref ^clojure.lang.IDeref ref)
           (deref-future ref)))
  ([ref timeout-ms timeout-val]
     (if (instance? clojure.lang.IBlockingDeref ref)
       (.deref ^clojure.lang.IBlockingDeref ref timeout-ms timeout-val)
       (deref-future ref timeout-ms timeout-val))))

This one is easy to solve. Deref should check instance? Future on the value if IDeref fails, and crash with a better error message. One instance? check is well worth the effort.


Backtick really is for macros only

The backtick notation is found in many lisps, and this includes Clojure. It is primary use is in macros because it lets you build up forms programmatically, but have them look like normal typed in forms. Compare these two:

(defmacro inc2 [x]
  `(+ ~x 2))

(defmacro inc2 [x]
   (list + x 2))

In many lisps, though, the backtick is just a list creation macro, that happens to be mostly used for macros. In clojure, it’s been hard coded for macros. Consider:

(let [x 'john]
  `(~x paul george ringo))

You might expect this to just return a list of four symbols (which it does), but the symbols are not what you might expect.

(john user/paul user/george user/ringo)

The symbols paul, george and ringo get namespace qualified in the return value even though they are not in the original form. Now, of course, there is a good reason for this; it helps to prevent us from accidental capture of symbols. All symbols should be qualified or gensym’d.

But consider this:

(deftype bob []
     java.lang.Runnable
   (run [this]
      (println "Hello")))

Now, I know this is a silly example, because bob is just implementing Runnable, and any function would do this, but Runnable is nice and simple. This is still quite a lot of typing, so, perhaps we should macro this.

(defmacro defrunnable[name body]
  `(deftype ~name []
      java.lang.Runnable
     (run [this]
       ~body)))

Unfortunately, this is actually wrong because the symbols run and this get namespace qualified, so we end up with user/run and user/this. The correct way to achieve this is this:

(defmacro defrunnable[name body]
  `(deftype ~name []
      java.lang.Runnable
     (~'run [~'this]
       ~body)))

Now, this version is anaphoric and introduces this, so perhaps it is not ideal, but run although it looks like a funtion is not one — it’s a lexical symbol that Clojure translates to the method name.


Whitespace

In Clojure , is whitespace. Effectively, it is used to make code pretty but has no meaning other than that. Those coming from other Lisps will sooner or later do something like this:

(defmacro defrunnable[name body]
  `(deftype ,name []
      java.lang.Runnable
     (,'run [,'this]
       ,body)))

This nearly always results in a strange error message somewhere down the line which is not easy to debug. The point is that other lisps use , to mean “unquote” for which Clojure uses ~. Not really Clojure’s fault this one, I guess. But irritating none the less.


Running in Java

One of the most unfortunate things about Clojure is that it’s hosted on the JVM. Of course, this is also the reason that I am using it, so I guess it makes no sense to complain, except when writing a article of “gotchas”. But being hosted on the JVM means Clojure inherits some of the strangeness of the JVM.

While writing Protege-NREPL, I had to struggle with the an OSGi and Clojure’s dynamic ClassLoader both of which do sort of the same thing, but sort of differently. It’s while getting this to work that I found that Clojure uses the context class loader.

In the end, I found that I needed this code to get anything working:

private final ClassLoader cl = new DynamicClassLoader(this.getClass().getClassLoader());
Thread.currentThread().setContextClassLoader(cl);

No one understand what the context class loader is, nor what it is for. There again, no one understands class loaders, so this is perhaps not a surprise.


Two times

Clojure uses what is effectively a two-pass compilation step. I say effectively, because apparently it doesn’t but the practical upshot is that you have to declare things before you use them. This is just a pain.

A related problem is that Clojure dislikes circular namespace dependencies. With Tawny-OWL, this means that the main namespace is not really in the order that I want it. And it was a big problem for the reasoner namespace. The problem is that the reasoner namespace has to know about the owl.clj namespace; but, also, the reasoner namespace has to know when an ontology is removed (so that any reasoners can be dropped). The obvious solution which is to have the owl.clj call reasoner.clj doesn’t work because we now have a circular dependency.

In the end, I solved this by implementing a hook system like Emacs. Now owl.clj just runs a hook. Probably, I should reimplement this directly with watches, but they were alpha at the time.


Goodbye Cons

One of the big wins for Clojure is built over abstractions, so that cons cell which is the core of most lisps is gone. Instead of this, we have ISeq which is an interface and looks like this:

Object first();

ISeq next();

ISeq more();

ISeq cons(Object o);

The problem is that it really does look like this; I mean, this is a cut-and-paste from the code. Aside these method declarations, that’s is. Nothing at all in the way of documentation.

Worse the entire API for Clojure consists of two classes, with the rest being considered “implementation detail”.

Strictly, therefore, Clojure is built over abstractions, but users of Clojure have no access to extend these abstractions themselves, unless they use implementation detail. Which, of course, they do; to access the heart of the language you have to. Given this reality, some documentation would be nice!


Conclusions

Clojure is a nice language, but in some parts it is still a little immature; some of these gotchas will disappear in time. The error message about Future’s is trivial to fix, for instance. Some of them already can be avoided with libraries: for example, the backtick issue can be avoided using an alternative implementation. Others, will I think, stick. Symbols will remain functions I suspect. The last issue, that of a public API, must be fixed if Clojure is to mature.

One gotcha I don’t mention is the lack of a type-system. There are many times when programming Clojure when I have created a bug that a type-system would have picked up instantly. This must, however, be set against those times when you stare at the screen in depression trying to work out why a perfectly innocuous piece of code will not compile. In the end, it’s often easier to debug running code, than it is to fix a broken type error. Both forms of problem are something you learn to live with, depending on your choice of language.

I am please to announce the second full release of Tawny-OWL, my library for fully programmatic development of OWL ontologies. Tawny-OWL now has a fairly large feature set and is becoming a rich development environment.

Perhaps the biggest single change in this release in terms of code base is the least immediately obvious from the user perspective. Previously a large part of the code base was using Java reflection and therefore quite slow. I have now type-hinted all the namespaces meaning that tawny should never reflect. The practical upshot of this is that Tawny runs faster; in the most extreme case, tawny.render is about 5x faster.

The most difficult change for me has be the regularisation of :subclass and :subproperty keywords. The reasons behind this have been described in great detail previously (http://www.russet.org.uk/blog/2985). This was not an easy change to make as it breaks the syntax significantly; I should have made the change before Tawny 1.0, but I didn’t. I am hopefully that there will not be similar changes in future.

The roadmap for Tawny 1.2 is relatively simple; currently there is no good way to search over an ontology and to extract classes fulfilling certain requirements (short of direct invocation of the OWL API). I now have a simple implementation of search facilities operating over OWL using a combination of core.logic and tawny.query; hardening and extending this will be the next logical (ahem!) step.

Bibliography

Tawny-OWL (http://www.russet.org.uk/blog/2366) enables a rich programmatic interface to OWL and ontology building. To an extent, I wrote Tawny because I wanted to get away from the use of Protege (http://protege.stanford.edu/) as an ontology editor. I compare the experience of Protege to Tawny as similar to a comparison between Excel and R; if the former does what you need, then it’s fine, but it’s hard to extend. So, it is with Tawny — it is simple to add patterns, new syntaxes, new capabilities. And I have access to all the standard tools that I expect with any programmatic environment; I can use versioning, build tools and test harnesses.

Having said all of this, Tawny-OWL comes with some cost. Although most IDEs have good capabilities for jumping to definitions and the like, they are limited compared to the display capabilities of Protege (http://protege.stanford.edu/); the ability to navigate quickly and rapidly through an ontology, to use tools like OWLViz to get a broad overview of the ontology structure.

Even if I feel that Protege is limited as an editor, I would still like to use its visualisation capabilities; it is unfortunate if, in choosing Tawny-OWL, I have to abandon Protege. This is not, however, necessary. It is possible to use Protege to visualise an ontology created by Tawny with synchronisation; changes are displayed by Protege immediately, as they are displaying the live data models that Tawny is manipulating. This is achieved by Protege-Nrepl; in this post, I describe the implementation behind it.


Background

Tawny is implemented in Clojure which is a lisp that compiles down to Java bytecodes; the OWL functionality comes from the OWL API which is the same API that Protege uses. In an abstract sense, then it should be possible to plug the two together; to have Tawny operate over the same data structures that Protege is displaying.

There are a number of ways to connect a Clojure process to an IDE, but the most common way is with a relatively recent tool called nrepl. This is a protocol and an tool implementing this protocol which allows communication with a Clojure process. There are now quite a few tools which have implemented clients to this protocol.


Protege-Nrepl

I was fortunate that Clojure provided most of the tools that I needed. Protege-Nrepl is a protege plugin which places a single menu item into the Protege frame. This then launches an internal Clojure process, which in turn launches a Nrepl socket. As it stands, Protege-Nrepl is not specific to Tawny — it simple provides a Clojure process. On the top of this, there is a small bridge package called Tawny-Protege which links together the data structures of Tawny, and Protege.

From a practical point-of-view, this means that I can launch protege, then connect to it from Emacs (or any other Clojure IDE). The IDE then operates in the same way as if Clojure were launched internally.

In theory, the process is very simple: I chose to implement the plugin itself in Java because this seemed easiest, not least because Protege provides a standard maven file to build plugins (initially, I used the older ant build, but the dependencies were a pain). Protege is an OSGI application; I have little knowledge of OSGI, so not having to work this part out was a relief. Java side the relevant code, looks like this:

RT.loadResourceScript("protege/dialog.clj");
RT.loadResourceScript("protege/nrepl.clj");

Var init = RT.var("protege.nrepl","init");
init.invoke();

// and later
Var newDialog = RT.var("protege.dialog", "new-dialog-panel");

Additionally there is some glue to implement the plugin interface, and some threading (loading Clojure in the paint thread is not a good idea). The protege.nrepl/init function loads a user config file, while protege.dialog/new-dialog-panel creates a GUI which starts the nrepl server.

That should be the process complete, but in my hands this failed; the problem is that OSGI requires me to pre-declare all the packages that I want to import within a bundle, so they get into the classpath. In this case, I included all the dependencies transitively anyway; the whole point of the plugin was to package Clojure up for Protege, so there was little point adding it independently. Protege classes (for the plugin) need to come from the protege environment, as do the OWL API classes, or I will not be able to manipulate objects created by protege with Tawny, as they would be different classes (of the same name, but different classloader).

For reasons that I could not determine, the OSGI manifest plugin also inserted a large number of dependency packages, including javax.servlet, junit, and some sun.misc classes; these are not available meaning that, even though they are not actually used, unless they are excluded specifically they make the plugin crash. All of this was achieved with the following modifications to the maven-bundle plugin.

<instructions>
  <Bundle-ClassPath>.</Bundle-ClassPath>
  <Bundle-SymbolicName>${project.artifactId};singleton:=true</Bundle-SymbolicName>
  <Bundle-Vendor>Phil Lord</Bundle-Vendor>

  <!-- We exclude a bunch of things here which otherwise get
   into the import list and are not provided from anywhere. How
   do they get there? No idea! -->
   <Import-Package>
     !javax.servlet*,!junit.*,!org.junit*,!org.apache.*,
     !org.testng.*,!sun.misc.*,*
   </Import-Package>
   <Include-Resource>plugin.xml,{maven-resources}</Include-Resource>
   <Embed-Transitive>true</Embed-Transitive>
   <Embed-Dependency>*;scope=compile</Embed-Dependency>
   <Require-Bundle>
     org.protege.editor.core.application,
     org.protege.editor.owl,
     org.semanticweb.owl.owlapi
   </Require-Bundle>
</instructions>

On the clojure side, the final addition was Pomegranate; enabling Clojure in Protege is fairly useless without being able to add new dependencies (such as Tawny!), but I did not want to add these to the maven build. Pomegranate allows me to add new dependencies on the fly.

As I always use Tawny, I add the following to ~/.protege-nrepl/init.clj so that it is alongside Protege. I may change this so it happens automatically; if anyone wanted to use protege-nrepl without Tawny they could still do so.

(ns init
  (:require
   [cemerick.pomegranate]
   [protege model nrepl]))

;; force loading of tawny
(cemerick.pomegranate/add-dependencies
 :coordinates '[[uk.org.russet/tawny-protege "1.1.0-SNAPSHOT"]]
 :repositories (merge cemerick.pomegranate.aether/maven-central
                                          {"clojars" "http://clojars.org/repo"}))
;; and monkey patch the thing
(require 'tawny.protege-nrepl)

;; initing the dialog takes ages -- so auto connect
(dosync (ref-set protege.model/auto-connect-on-default true))

Lein-Sync

When launched from within Protege, the Clojure process will be running independently of a Maven or leiningen project. If, for example, I try and load the tawny.pizza/pizza, clojure will fail as it cannot find the local resources, nor any dependencies.

To handle this situation, I have created lein-sync — this is a leiningen plugin which is run in the project directory, which creates a .sync.clj file which contains all the Pomegranate code needed to extend the local classpath. For instance, this file generated for the tawny.pizza looks like this:

;; This file is auto-generated by lein sync
(require 'cemerick.pomegranate)
(cemerick.pomegranate/add-dependencies
 :coordinates
 '[[uk.org.russet/tawny-owl "1.0-SNAPSHOT"]
   [org.clojure/tools.nrepl
    "0.2.3"
    :exclusions
    ([org.clojure/clojure])]
   [clojure-complete/clojure-complete
    "0.2.3"
    :exclusions
    ([org.clojure/clojure])]
   [ritz/ritz-nrepl-middleware "0.7.0"]
   [org.clojure/tools.trace "0.7.5"]
   [compliment/compliment "0.0.1"]]
 :repositories
 '[["central"
    {:snapshots false, :url "http://repo1.maven.org/maven2/"}]
   ["clojars" {:url "https://clojars.org/repo/"}]])
(cemerick.pomegranate/add-classpath
 "/home/phillord/src/knowledge/ontology-clj/tawny-pizza/src")
(cemerick.pomegranate/add-classpath
 "/home/phillord/src/knowledge/ontology-clj/tawny-pizza/dev-resources")
(cemerick.pomegranate/add-classpath
 "/home/phillord/src/knowledge/ontology-clj/tawny-pizza/resources")
(.println System/out "Loaded .sync in pizza")

Some of these dependencies (compliment, tools.trace) come from my local leiningen configuration. Loading this file, ensures an nrepl launched from within Protege behaves in the same way as a locally launched nrepl. Currently, classpath extension uses fully qualified paths which obviously requires the same (or a shared) file system between the leiningen instance generating .sync.clj and Protege; I may address this latter as it would enable me to run Protege on a different machine from the IDE.

Finally, I have written some Emacs to connect to the nrepl server and automatically run .sync.clj on connection; adding something similar for other IDEs would be straight-forward, although manual use of the repl is also possible.


Conclusions

Given all the availability of the tools, conceptually building protege-nrepl was straight-forward. In practice, it was made somewhat more complex through a combination of ClassLoaders, OSGI and the need to dynamically extend the classpath in a running JVM. In particular, my experience of running OSGI has not been positive; I spent a substantial amount of time chasing down a very strange bug caused by an inconsistency between the OWL API and Protege. Combined with the strange behaviour of the maven plugin which I only solved by multiple trial and error restarts, it all added a lot of complexity. Currently, I am using a pre-release version of Protege as this has been ported to maven; this requires a local build which I realize is not an end-user experience.

The end product, however, was worth the effort. Despite my criticisms of Protege, it remains an excellent tool; having a running Protege, updating live is a considerable advance over the old “save and reload” workflow that I used previously. I look forward to the next release of Protege, as this use of Tawny-OWL, protege-nrepl and Protege will increase the attractiveness of Tawny considerably.

Bibliography

Literate programming comes in many forms and disguises but is essentially the notion that the documentation and programmatic code should be written together, so that the documentation supports the code and vice versa. In this post, I discuss some of the problems with literate programming, my early attempts to circumvent these with respect to ontology development. Finally, I finish up with a description of some new technology which, I think, offers a solution.


Literate Programming for Ontologies

The reality is, I think, that literate programming has never really take off; there are a large number of reasons for this of course. Code does not naturally have an linear narrative and is not necessarily read in this way: rather, when read by an experienced programmer, they often track the flow of execution through the code (http://synesthesiam.com/posts/modeling-how-programmers-read-code.html). A secondary problem is apparently quite trivial but the editing environment for literate programmes tends to be poor. I cannot find any good research on this, but this is both my experience and that of others (http://unspecified.wordpress.com/2010/06/04/literate-programming-is-a-terrible-idea).

For ontology development, I think a literate approach seems to make more sense. Again, in my experience, ontologies do have a somewhat more narrative approach than code — at least in the sense that the lack loops and the like.


Initial Approaches

I have now been experimenting with literate techniques since 2009. The first version used a single latex file, and pulled these out into a Manchester syntax file (http://www.russet.org.uk/blog/1213). This worked quite nicely but suffered from the poor editor problem: I was building ontologies embedded in LaTeX, so lacked even the basic features (such as syntax highlighting) that I got when editing Manchester syntax files directly. This was a problem even with the very limited feature set from tools like omn-mode.el. The disadvantage would have been worse if I had been used to a richer environment for Manchester syntax.

My second attempt was took the opposite approach; now I used two files — a Manchester syntax file and a LaTeX one with a method for referring between the two (http://www.russet.org.uk/blog/1258). This worked okay but had a poor implementation which I later refined (http://www.russet.org.uk/blog/1269).

These approaches have their advantages but do both suffer from a poor editing environment; either in having two files to switch and link between, or favouring documentation over ontology or vice versa. They also suffered from a secondary issue, which is that they are based around Manchester syntax. While this is nice enough, since writing Tawny-OWL (1303.0213), this style of ontology development just feels not rich enough.


Marginalia

One of the declared advantages of using a real programming language as the basic for Tawny-OWL was the ability to use the tools from that language; I have used a number of these both within Tawny-OWL and with ontologies written with Tawny: mostly obviously, the test environment, but also serialisation, properties support and, of course, the entire editing environment.

This raises the question as to whether I could use literate programming tools from Clojure as well. To my knowledge, the only real option in town here is Marginalia. Marginalia uses markdown as the documentation format and builds a nice presentation with code on one side, and comments on the other.

However, it has problems. Firstly, it presents all comments as text — you cannot comment the comments as it were which is irritating for boilerplate such as licence text. Secondly, the side-by-side presentation breaks the flow of reading as you have to move your eyes around the screen all the time. And, finally, it’s Markdown. While Markdown is nice at what it does, it’s very limited, and I missed the extra power of something like LaTeX.

The main difficulty, though, remains the editing environment. Without special support, while editing the comments show up as just comments. I can never remember the order of brackets in Markdown links — I rely on syntax highlighting to tell me that I have it correct.

Is there a way that Clojure and LaTeX can be made to work together?


LaTeX experiments: line comments

My first thought, in experimenting with LaTeX was a remarkably cheap and cheerful one. Consider a document such as this:

;; \documentclass{article}
;; \begin{document}
;; \begin{code}
(println "hello world")
;; \end{code}
;; \end{document}

This is a valid Clojure file, and is nearly valid latex as well. The only illegal part is that ;; occurs before the documentclass macro, although, in practice having ;; appear randomly throughout the document would not be ideal either.

Now, LaTeX as an embedded markup language has a very plastic syntax, and I have used that in this case. It is actually very easy to just ignore the ; character entirely, through the use of Catcodes; we can put this into a driver file which then inputs our Clojure file like so:

\catcode`;=9
\input{file.clj}

This way we maintain the validity of our Clojure file (otherwise the first line would be illegal). This is a remarkably cheap and cheerful way of achieving our aims; albeit at the cost of losing the ability to use semi-colons in our writing.


Indirect-buffers

What, however, about the editing environment. My own preferred environment — Emacs — has nice modes which edit both LaTeX and Clojure code, and it is possible to switch between the two, when I want to move between editing code and editing documentation. This is quite clunky, but there is a second option which is “indirect-buffers”. This is a piece of Emacs arcana where two buffers share some of the same data structures but not all, which means that they can have different modes. Unfortunately, my experience is that the buffers share too much — as well as the text, they also share “text-properties” which unfortunately both LaTeX and Clojure mode use. In practice, this means syntax highlighting fails (or rather than two representations fight with each other). As a second problem, although the file is valid LaTeX it is not normal LaTeX; simply things like wrapping text in paragraphs fails because of the ;; comments at the beginning of each line.

So, this experiment fails the editing environment test.


LaTeX experiments: block comments

My next attempt was to use block comments. Consider this file which is valid lisp using #| and |# block comments.

#|
\documentclass{article}
\begin{document}
\begin{code}
|#
(println "hello world")
#|
\end{code}
\end{document}
|#

We can use a similar (but not identical) trick with catcodes to make this valid latex also:

\catcode`#=\active
\def#{\catcode`#=6}
\catcode`|=\active
\def|{\catcode`|=12}
\input{hello_world.lisp}

The first call makes the # character active — that is, definable as a macro. We then define # as a macro which will set the catcode of # to 6 (which is it’s default). Then, we do the same with |. The practical upshot of this is that the opening #| does nothing other than reset everything in the driver file; effectively it’s ignored.

This actually works quite nicely in the editing environment; the opening #| effectively makes no difference to Emacs, and the mode works well. The only real disadvantage is that every code block needs two delimiters — one to open the code block in latex, and end the comment in Lisp.

Now there are various multi-mode tools around for Emacs which should help solve the otherwise clunky editing environment, although even here I am not convinced that this is the right route. Multi-mode tools are complex and to some extent are not what I want — when editing code I want to suppress the documentation, give it a low visual immediacy, and when editing documentation, I want the reverse.

There is, however, a bigger problem — while the last example is valid Common Lisp, Clojure does not have block comments, nor does the programmer have the ability to extend the reader in this way. So, while this seems a nice solution, it depends on a specific language feature which Clojure lacks.


Emacs Experiments: formats

My next idea was to use formats. Emacs allows transformations to happen between the text that is visualised on screen, and how it is saved to file. The main reason for this is to support the many non-ascii text formats that exist. But it is (perhaps unsurprisingly) fully extensible within Emacs and could be used for any purpose. So, why not convert line-commented Clojure on file into block-comments on screen; this will give editable latex on screen and valid Clojure on file and a driver file to give valid Latex on file also.

Unfortunately, it fails. While Emacs latex support is file based, Clojure (and specifically cider) has a tighter integration; it can communicate the contents of a buffer without saving to file. This circumvents the formatting — the block comments are sent to Clojure process which complains.


Emacs Experiments: linked-buffer

I am now experimenting with another option. indirect-buffers place exactly the same text (and text-properties) into two buffers. Instead of sharing all the text, why not have two buffers with a function that can transform the text bi-directionally between the two. The practical result is two views over the same content. Surprisingly, this works pretty well, as you can see here even though my current implementation is very simple — the whole buffer is copied every keypress. We could achieve the same thing with indirect-buffers, but as well as simple copying, however, we can also transform the text on the fly so that both buffers are valid for their respective modes.

The broad idea is not that new — it’s similar to web/weave or SWeave for instance, except that it is embedded into the editor; this means we can take advantage of existing support for both languages from the editor and the author gets immediate feedback about the transformation — so messing up the syntax is pretty obvious.

It also provides a superset of functionality provided by other techniques: indirect-buffers as mentioned previously, shadowfile.el (which creates a second copy of a file somewhere else on every save), and it could also mimic shadow.el which generates a secondary file by a command invocation on every save (although an invocation of an external command every keypress would probably not be performant).

The first release of linked-buffer was a month ago. I am currently unhappy with the configuration and will change this so code is in flux at the moment, but I am using it in anger which is a good sign. Currently, it does a latex <-> clojure transformation, but I will add a few more as time goes on.


Discussion

It has taken me quite a while to get to this stage, and a number of experiments along the way, but my feeling is that I now have a workable literate environment. It also validates my decision to build Tawny (http://www.russet.org.uk/blog/2962). Having a rich textual language for building ontologies is a bit of a game changer; providing programmatic extensions to the language has been helpful, but the access to other tools, git, travis, tests and a repl has really made the difference. Now adding a literate environment to this as well changes the way that I can use ontologies and is a paradigm shift in their development.

Bibliography