Archive for the ‘All’ Category

I’ve noted in the past some of the strange beliefs about DOIs (http://www.russet.org.uk/blog/1849). One of these is that DOIs provide some magic archiving capability (http://www.russet.org.uk/blog/2360). The other is the strange one that “DOIs make things citable”. This was one of the selling points for Figshare, for instance.

I’m interested to see that now GitHub have now joined the party (http://github.com/blog/1840-improving-github-for-science), and again using the justification that “DOIs make things citable”. I am lost in attempting to understand this.

First, GitHub have stable URIs for repositories. It’s in their business interests to keep these and if they change them they will break every single repository that has checked things out using the URI.

Second, if I have a github URI I actually know that I have a link to a repository, and it is fairly clear that I can clone from this repository. With a DOIs I do not. Paper, datacite item, git repo, it is not possible to tell.

Third, with a github URI I have a URI that I compare against other URIs and work out whether it is the same or different. If I have a DOI, I now have two identifiers, the DOI and the URI both of which identify the same thing. Surely, this makes the situation worse, and not better.

Am I being a little cynical in wondering why some publishers require them? Do they, perhaps have a vested interest in making things more invouluted and not just using standard web technology (http://www.russet.org.uk/blog/2248)?

It seems to me like a clear case of DOIs are magical fairy dust. We sprinkle them on a github repository and now it is better, when actually we have made the situation worse.

The only justification that we have is “DOIs make it citable”. Is there a better one? Answers on a post-card please.


Update

I totally missed a post by Carl Boettiger which makes some of the same points (http://www.carlboettiger.info/2013/06/03/DOI-citable.html).

On the general issue of metadata, a DOI will give some harvestable metadata from the DOI, although Greycite can give much of the same metadata direct from GitHub (see for instance here). Having GitHub fix their metadata would seem to me to have been an easier win. And, of course, github URIs can be used to clone from and extract all the repository metadata using, well, git.

Bibliography

I’ve been programming in Clojure for well over a year now; originally, I heard about it care of Sam Aaron, an old PhD student of ours who gave a fun lunch time talk; I rarely go to these (although I probably should). Indirectly, Tawny-OWL came out of this one, so it is good that I went.

During the time that I have used Clojure, I have come to know it fairly well, and appreciate many of it’s finer points; these are not the same as many people, I realise; for me, the Java integration is simple, effective and very important, as Tawny-OWL is essentially a wrapper over a Java library. Meanwhile, a lot of the nice concurrency features are a matter of indifference to me, again for the same reason.

But like any language there are some problems, or at least thing that don’t work for me. On the off-chance it is useful to anyone else, here is my list of Clojure gotchas.


Lazy Lists

This is quite a common one, of course, which hits most Clojure beginners. We write something like:

(def x (map (fn [x] (println "hello") x) (range 1 100)))

and then wonder why nothing prints out. Or, the alternative problem, write something like (range) and find the REPL hangs. The latter is, I think, a poorly performing REPL; infinity might be more principled a point at which to stop than an arbitrarily choosen value but it’s not useful.

Of course, once you have got past this point, it’s not so bad, but laziness can still take you unawares, especially when I was using Clojure just to drive a JVM library. This subtle bug from tawny.render which is, essentially, one big recursive descent renderer, demonstrates the problem. Consider, this code:

(concat
  [:fact]
  (form [:fact fact])
  (form [:factnot factnot]))

Looks fine, but I need to pass options and a lookup cache around and had done this with a number of dynamic vars. The cache, it turns out, would not have been working for this form (although it was for others), but I never noticed this. However, the options broke the code more cleanly. concat is, of course, lazy, and was happening outside the binding form which defines the dynamic vars.

Now, I know dynamic vars and laziness don’t mix. In the end, I have added an additional parameter to all the functions in my renderer using the awesome power of lisp (i.e. I wrote a dodgy search and replace function in Emacs). And the cache now invalidates itself using a better technique than before. But I didn’t want laziness, I just got it by chance. In Clojure, it’s always there, wanted or not. Or, rather, it’s always sometimes there, because Clojure is only partly lazy.


Lisp-1 vs Lisp-2

Well, this argument is as old as the hills. Clojure is a lisp-1, so it has a single namespace for variables and functions, while Common Lisp and Emacs-Lisp are a lisp-2, so have one namespace for each.

I’ve had fun with single namespaces before — I used to teach Javascript to new programmers and it produces wierd and wonderful bugs that can be hard to track down. Still, I am too old and wize for that. If only!

During Tawny-OWL, I found accidental capture of functions produced some strange artifacts. Consider, for example, this code.

(defn my-get[x map]
   (get x map))

Everything works fine here, of course, right up till the point that you get bored of typing map and change it to m:

(defn my-get[x m]
   (get x map))

Now things break in strange ways. map is now the (global) function and not the parameter. There are many ways around this, of course. I could not have done (use 'clojure.core) earlier and just imported the functions I use; except that I did use map elsewhere. I could namespace everything (try and find some examples of Clojure code with namespace qualfied or aliased clojure.core functions).

In my case, exactly this problem hit me when I renamed parameters called ontology to o. I thought the compiler would pick up my errors but no, because I had an ontology function. This situation is made worse by my next gotcha which is:


Everything is a function

Consider this entirely pointless piece of code which makes lisp post-fix.

(defmacro do-do [x afn]
    `(do ~(afn x)))

We can use this macro like so:

(do-do 1 inc)

Now, if you know only a little about lisp, you might expect this to return 2. If you are more experienced, then you might think that this is a strange thing to do, because the call to (inc 1) happens at macro-expansion time, and why would you want to do that? If you are more experienced still, you will think, well actually inc is not evaluated so it is actually a symbol, and the whole thing is going to crash.

Actually, it returns nil. The reason for this is that lots of things in Clojure are functions that you wouldn’t expect, and symbol is one of these. So, actually, ('inc 1) returns nil. Because symbols are functions which lookup the occurance of the symbol in the collection that follows.

Now this has advantages, of course, namely that you can use a symbol to look up a key in a collection. So, for example:

('bob {'bob 1})

Returns 1. Of course, this is nice, but how many times do you actually want to do this? And when you do, would (get {'bob 1} 'bob) really be so hard? I can see the justification for (:bob {:bob 1}) but for symbols I am really not convinced, unless I am missing some other critical advantage.


Future, what’s a Future

So, your working along happily in your single threaded application, and then you write this:

(def x 1)
(def y (ref 2))

(+ @x y)

Now, in this small example, the error is easy to find; we should have derefed y and not x. And what is the error that we get from this?

ClassCastException java.lang.Long cannot be cast to
    java.util.concurrent.Future
    clojure.core/deref-future (core.clj:2108).

But I have not used a future. I have never used a future. I do not even know what a future is (although, I may, of course do so in the future). The reason for this strange error message can be seen from the code for deref (which the @ reader macro uses. Since, integers do not implement IDeref we treat them as a Future, which then causes the cast exception.

(defn deref
  {:added "1.0"
   :static true}
  ([ref] (if (instance? clojure.lang.IDeref ref)
           (.deref ^clojure.lang.IDeref ref)
           (deref-future ref)))
  ([ref timeout-ms timeout-val]
     (if (instance? clojure.lang.IBlockingDeref ref)
       (.deref ^clojure.lang.IBlockingDeref ref timeout-ms timeout-val)
       (deref-future ref timeout-ms timeout-val))))

This one is easy to solve. Deref should check instance? Future on the value if IDeref fails, and crash with a better error message. One instance? check is well worth the effort.


Backtick really is for macros only

The backtick notation is found in many lisps, and this includes Clojure. It is primary use is in macros because it lets you build up forms programmatically, but have them look like normal typed in forms. Compare these two:

(defmacro inc2 [x]
  `(+ ~x 2))

(defmacro inc2 [x]
   (list + x 2))

In many lisps, though, the backtick is just a list creation macro, that happens to be mostly used for macros. In clojure, it’s been hard coded for macros. Consider:

(let [x 'john]
  `(~x paul george ringo))

You might expect this to just return a list of four symbols (which it does), but the symbols are not what you might expect.

(john user/paul user/george user/ringo)

The symbols paul, george and ringo get namespace qualified in the return value even though they are not in the original form. Now, of course, there is a good reason for this; it helps to prevent us from accidental capture of symbols. All symbols should be qualified or gensym’d.

But consider this:

(deftype bob []
     java.lang.Runnable
   (run [this]
      (println "Hello")))

Now, I know this is a silly example, because bob is just implementing Runnable, and any function would do this, but Runnable is nice and simple. This is still quite a lot of typing, so, perhaps we should macro this.

(defmacro defrunnable[name body]
  `(deftype ~name []
      java.lang.Runnable
     (run [this]
       ~body)))

Unfortunately, this is actually wrong because the symbols run and this get namespace qualified, so we end up with user/run and user/this. The correct way to achieve this is this:

(defmacro defrunnable[name body]
  `(deftype ~name []
      java.lang.Runnable
     (~'run [~'this]
       ~body)))

Now, this version is anaphoric and introduces this, so perhaps it is not ideal, but run although it looks like a funtion is not one — it’s a lexical symbol that Clojure translates to the method name.


Whitespace

In Clojure , is whitespace. Effectively, it is used to make code pretty but has no meaning other than that. Those coming from other Lisps will sooner or later do something like this:

(defmacro defrunnable[name body]
  `(deftype ,name []
      java.lang.Runnable
     (,'run [,'this]
       ,body)))

This nearly always results in a strange error message somewhere down the line which is not easy to debug. The point is that other lisps use , to mean “unquote” for which Clojure uses ~. Not really Clojure’s fault this one, I guess. But irritating none the less.


Running in Java

One of the most unfortunate things about Clojure is that it’s hosted on the JVM. Of course, this is also the reason that I am using it, so I guess it makes no sense to complain, except when writing a article of “gotchas”. But being hosted on the JVM means Clojure inherits some of the strangeness of the JVM.

While writing Protege-NREPL, I had to struggle with the an OSGi and Clojure’s dynamic ClassLoader both of which do sort of the same thing, but sort of differently. It’s while getting this to work that I found that Clojure uses the context class loader.

In the end, I found that I needed this code to get anything working:

private final ClassLoader cl = new DynamicClassLoader(this.getClass().getClassLoader());
Thread.currentThread().setContextClassLoader(cl);

No one understand what the context class loader is, nor what it is for. There again, no one understands class loaders, so this is perhaps not a surprise.


Two times

Clojure uses what is effectively a two-pass compilation step. I say effectively, because apparently it doesn’t but the practical upshot is that you have to declare things before you use them. This is just a pain.

A related problem is that Clojure dislikes circular namespace dependencies. With Tawny-OWL, this means that the main namespace is not really in the order that I want it. And it was a big problem for the reasoner namespace. The problem is that the reasoner namespace has to know about the owl.clj namespace; but, also, the reasoner namespace has to know when an ontology is removed (so that any reasoners can be dropped). The obvious solution which is to have the owl.clj call reasoner.clj doesn’t work because we now have a circular dependency.

In the end, I solved this by implementing a hook system like Emacs. Now owl.clj just runs a hook. Probably, I should reimplement this directly with watches, but they were alpha at the time.


Goodbye Cons

One of the big wins for Clojure is built over abstractions, so that cons cell which is the core of most lisps is gone. Instead of this, we have ISeq which is an interface and looks like this:

Object first();

ISeq next();

ISeq more();

ISeq cons(Object o);

The problem is that it really does look like this; I mean, this is a cut-and-paste from the code. Aside these method declarations, that’s is. Nothing at all in the way of documentation.

Worse the entire API for Clojure consists of two classes, with the rest being considered “implementation detail”.

Strictly, therefore, Clojure is built over abstractions, but users of Clojure have no access to extend these abstractions themselves, unless they use implementation detail. Which, of course, they do; to access the heart of the language you have to. Given this reality, some documentation would be nice!


Conclusions

Clojure is a nice language, but in some parts it is still a little immature; some of these gotchas will disappear in time. The error message about Future’s is trivial to fix, for instance. Some of them already can be avoided with libraries: for example, the backtick issue can be avoided using an alternative implementation. Others, will I think, stick. Symbols will remain functions I suspect. The last issue, that of a public API, must be fixed if Clojure is to mature.

One gotcha I don’t mention is the lack of a type-system. There are many times when programming Clojure when I have created a bug that a type-system would have picked up instantly. This must, however, be set against those times when you stare at the screen in depression trying to work out why a perfectly innocuous piece of code will not compile. In the end, it’s often easier to debug running code, than it is to fix a broken type error. Both forms of problem are something you learn to live with, depending on your choice of language.

I am please to announce the second full release of Tawny-OWL, my library for fully programmatic development of OWL ontologies. Tawny-OWL now has a fairly large feature set and is becoming a rich development environment.

Perhaps the biggest single change in this release in terms of code base is the least immediately obvious from the user perspective. Previously a large part of the code base was using Java reflection and therefore quite slow. I have now type-hinted all the namespaces meaning that tawny should never reflect. The practical upshot of this is that Tawny runs faster; in the most extreme case, tawny.render is about 5x faster.

The most difficult change for me has be the regularisation of :subclass and :subproperty keywords. The reasons behind this have been described in great detail previously (http://www.russet.org.uk/blog/2985). This was not an easy change to make as it breaks the syntax significantly; I should have made the change before Tawny 1.0, but I didn’t. I am hopefully that there will not be similar changes in future.

The roadmap for Tawny 1.2 is relatively simple; currently there is no good way to search over an ontology and to extract classes fulfilling certain requirements (short of direct invocation of the OWL API). I now have a simple implementation of search facilities operating over OWL using a combination of core.logic and tawny.query; hardening and extending this will be the next logical (ahem!) step.

Bibliography

Before commit eb2f0e04, I used to have this function in tawny.owl.

(defbdontfn
  add-subclass
  {:doc "Adds one or more subclass to name in ontology."
   :arglists '([name & subclass] [ontology name & subclass])}
  [o name subclass]
  (add-axiom o
             (.getOWLSubClassOfAxiom
              (owl-data-factory)
              (ensure-class o name)
              (ensure-class o subclass))))

The idea is, as the name suggests to add a subclass relationship to the ontology; on the face of it, everything looks fine. However, a closer look at the OWL API raises a question:

getOWLSubClassOfAxiom(OWLClassExpression subClass, OWLClassExpression superClass)

The subclass parameter in Clojure maps to the superClass parameter in Java. The subclass in Clojure is actually the superclass.

If we compare the property equivalent in Tawny, things seem more regular:

(defbdontfn add-superproperty
  "Adds all items in superpropertylist to property as
a superproperty."
  [o property superproperty]
  (add-axiom o
             (.getOWLSubObjectPropertyOfAxiom
              (owl-data-factory)
              (ensure-object-property o property)
              (ensure-object-property o superproperty))))

and the equivalent Java:

getOWLSubObjectPropertyOfAxiom(OWLObjectPropertyExpression subProperty,
                               OWLObjectPropertyExpression superProperty)

The names of the parameters are now the same way around in Clojure and Java. So, have I made a mistake in Tawny with subclass handling? Actually, no, because we get strangeness at a different point with properties; consider the object-property-handlers which map between frames and the functions which implement them:

(def ^{:private true} object-property-handlers
  {
   :domain add-domain
   :range add-range
   :inverse add-inverse
   :subproperty add-superproperty
   :characteristic add-characteristics
   :subpropertychain add-subpropertychain
   :disjoint add-disjoint-property
   :equivalent add-equivalent-property
   :annotation add-annotation
   :label add-label
   :comment add-comment})

So, the :subproperty: frame is implemented with the add-superproperty function. As might be expected, :subclass is implemented with add-subclass

Even without this oddness, the problem can be seen when considering just the add-* functions. Consider, add-label:

(defbmontfn add-label
  "Add labels to the named entities."
  [o named-entity label]
  (add-annotation
   o
   named-entity
   [(tawny.owl/label label)]))

The semantics of this are that the third argument, label, is added to the second, named-entity as a label. It is slightly more complex than this; the b in defbmontfn means broadcast — add-label is actually variadic and flattens meaning that any number of labels can be added.

With add-subclass the semantics are reversed; the second argument becomes a subclass of the third (or, again, because of broadcasting, the third or subsequent arguments). And add-subclass is inconsistent here — all of the other add-* have the same semantics as add-label.

So, clearly, both add-subclass and the :subproperty frame have problems, and are not consistent with the rest of the API. Two important parts of Tawny-OWL have been implemented backward. How did this happen?


Investigating Manchester Syntax

We can investigate this further, by considering another inconsistency with Tawny. Considering the object-property-handlers above, we can see that while :subproperty is implemented with add-superproperty, :subpropertychain is implemented with add-subpropertychain.

The slot names in Tawny come (nearly) directly from Manchester syntax; so, let us compare Manchester syntax with the functional syntax for sub-properties and sub-property chains, using the OWL Primer. In Manchester syntax:

ObjectProperty: hasFather
   SubPropertyOf: hasParent

In functional syntax:

SubObjectPropertyOf(
   :hasFather
   :hasParent
)

Compare this to the equivalent declaration for subproperty chain.

ObjectProperty: hasGrandparent
   SubPropertyChain: hasParent o hasParent

Or in functional syntax:

SubObjectPropertyOf(
   ObjectPropertyChain( :hasParent :hasParent )
   :hasGrandparent
 )

The filler for SubPropertyChain: comes first, while for SubProperty: is comes second.

This suggests that the SubPropertyOf: and SubPropertyChain: frames are back-to-front from each other (this is the values of the slots appear in different orders in the two syntaxes). So, with the former, SubPropertyOf: I am stating that the entity (hasFather) is related to the filler (hasParent) and that the filler (hasParent) is the super property. With the latter, SubPropertyChain: I am stating that the entity (hasGrandparent) is related to the filler (hasParent o hasParent) and that the filler (hasParent o hasParent) is the sub property.

So, the two appear to be inconsistent with each other. So, let’s consider a further analysis of the other slots. Consider, for example:

A
  Annotations: rdfs:label B

which means B is an annotation of A.

A
 EquivalentTo: B

means B is equivalent to A (or, in this case, that A is equivalent to B as equivalance is symmetrical).

A
  Domain: B

means B is a domain of A

A
  Type: B

means B is a type of A.

All of these are consistent with each other: the filler (B) has a relationship to the entity (A) which is defined by the slot (type), with the caveat that the EquivalentTo relationship is symmetric.

Now

A
  SubClassOf: B
A
  SubPropertyOf: B

are backward: the entity (A) has a relationship to the filler (B) defined by the slot (SubClassOf:, SubPropertyOf:) – it’s why the Of preposition has been added. It is not possible to add the same preposition to the other slots; although it is possible to add has to the beginning. So, for example, the natural language semantics of these statements preserves their OMN meaning:

A HasAnnotation: B
A HasType: B
A HasKey: B

Of these, only the latter is actually OMN. The only other slots with prepositions are EquivalentTo and SameAs — you could change these to has as well.

A HasEquivalent: B
A HasSame: B

This probably reduces the readability over all, but it does at least maintain the semantics. It is for this reason that I say SubClassOf: is backward; to be consistent, it should be Super:

So

A Super: B

means B is a superclass of A. Now, we could add the has preposition to the start, while preserving the natural language semantics.

A HasSuper: B

Everything that I have said here is also true of SubPropertyOf: which behaves in the same way as SubClassOf: (i.e. backwards wrt to most slots).

Going back to the very early question, SubPropertyChain: (note, not SubPropertyChainOf:) is the same way around as most slots and the opposite way around from SubPropertyOf:

A SubPropertyChain: B o B

could be replaced with

A HasSubPropertyChain: B o B

In summary, for Manchester syntax SubClassOf: and SubPropertyOf: frames are backward with respect to all the other frames.


The Implications for Tawny

Unfortunately, the situation in Tawny-OWL was slightly worse than for Manchester syntax. While writing an early version of the karyotype ontology (1305.3758) by hand, I found typing too hard so removed the prepositions (:subclass and not :subclassof). Combined with the lack of CamelCase, this seemed a cleaner syntax. But it has exacerbated the issues described here.

Although, I have become aware of this problem before the release of the first full version of Tawny, I decided that consistency with Manchester syntax was worth the hassle. My recent experiments with literate ontologies (http://www.russet.org.uk/blog/2979), however have made me realise that I could not leave the situation as it is. One key feature of Tawny is that it (normally) forces declaration of entities before use which avoids simple spelling mistakes common when writing Manchester syntax by hand. However, only having access to a :subclass slot means that ontologies must be declared from the top of the inheritance hierarchy downward. For a literate ontology, this restriction seems unnecessary, and places an unfortunate emphasis on the upper ontology. I would like also to be able to build from the bottom up.

Neither having the semantics of add-subclass backward, nor the :subproperty add-superclass solution work well as it stands, and extending this to a :superclass slot would make the situation worse. In short, the only sensible fix was to diverge from OWL Manchester syntax, and deprecate the use of :subclass and :subproperty. At the same time, I decided to remove some extra typing. Therefore, :subclass has become :super (shortening and reversing the natural language semantics, retaining the logical semantics), and the new slot :sub has been added. Likewise, :subproperty has become :super and a new slot :sub introduced for properties also. As well as avoiding extra typing, removing the suffix has meant that I can leave :subclass and :subproperty in place but deprecated; the alternative of just reversing their semantics seemed unfortunate. Only the semantics of add-subclass has been broken, being reversed.

The inconsistency with Manchester syntax is currently a little painful, especially as the :subclass slot has been around since the early days of Tawny (http://www.russet.org.uk/blog/2214). The advantage, however, is that I have a simple rule to remember: A :s B means “A has :s B” or equivalently, “B is :s of A“. For this reason, and because it paves the way for richer literate ontologies, I feel that this is a good change.

Bibliography

I was lucky enough to catch (er) the premier of Catch-22 at the Northern Stage on Saturday. I’ve been to quite a few shows there now, and they are generally good; an adaption of what I consider to be best book that I’ve ever read, I was optimistic. We had good seats as well, middle, fifth row (far enough away not to get a stiff neck, close enough to hear clearly).

Looking through the programme, I was confused as to who had made the adaption as it wasn’t not mentioned anywhere; fortunately, BBC News put me right; the play was my Joseph Heller himself.

The stage set was fantastic, a cut-away bomber with the back-end merging into a beach hut. Like the rest of the set, it was used for many purposes — as a plane, an office, an entrance way. The cast was the same; nine actors jumping backward and forward between roles, except for the actor playing Yossarian, who, as in the book, was a solitary figure in the middle of the madness.

On the whole, I think it worked rather well. It’s a mistake, I think, to compare it to the book directly; nothing ever could. Many fantastic parts were missed — including my own favorite great loyality campaign, and the shortening meant that only a few characters really came out of their own: Colonel Cathcart, Major Major, the Chaplain, Natley (and his whore). Not all of it, I think, made entire sense: the naked man in the tree was funny, but it wasn’t clear why Yossarian refused to wear his clothes; nor the ending, with Orr’s escape missing, it’s not clear why Yossarian got all optimistic. But how could there not be parts missing? The main thing is that the feel of the theatre show and the book are the same; it’s confused, dissonate, unsettling, challenging all at the same time as being very, very funny.

The BBC news article raises the question, in the 40 years since the play was written why has it not been performed more. Good question, indeed.

Tawny-OWL (http://www.russet.org.uk/blog/2366) enables a rich programmatic interface to OWL and ontology building. To an extent, I wrote Tawny because I wanted to get away from the use of Protege (http://protege.stanford.edu/) as an ontology editor. I compare the experience of Protege to Tawny as similar to a comparison between Excel and R; if the former does what you need, then it’s fine, but it’s hard to extend. So, it is with Tawny — it is simple to add patterns, new syntaxes, new capabilities. And I have access to all the standard tools that I expect with any programmatic environment; I can use versioning, build tools and test harnesses.

Having said all of this, Tawny-OWL comes with some cost. Although most IDEs have good capabilities for jumping to definitions and the like, they are limited compared to the display capabilities of Protege (http://protege.stanford.edu/); the ability to navigate quickly and rapidly through an ontology, to use tools like OWLViz to get a broad overview of the ontology structure.

Even if I feel that Protege is limited as an editor, I would still like to use its visualisation capabilities; it is unfortunate if, in choosing Tawny-OWL, I have to abandon Protege. This is not, however, necessary. It is possible to use Protege to visualise an ontology created by Tawny with synchronisation; changes are displayed by Protege immediately, as they are displaying the live data models that Tawny is manipulating. This is achieved by Protege-Nrepl; in this post, I describe the implementation behind it.


Background

Tawny is implemented in Clojure which is a lisp that compiles down to Java bytecodes; the OWL functionality comes from the OWL API which is the same API that Protege uses. In an abstract sense, then it should be possible to plug the two together; to have Tawny operate over the same data structures that Protege is displaying.

There are a number of ways to connect a Clojure process to an IDE, but the most common way is with a relatively recent tool called nrepl. This is a protocol and an tool implementing this protocol which allows communication with a Clojure process. There are now quite a few tools which have implemented clients to this protocol.


Protege-Nrepl

I was fortunate that Clojure provided most of the tools that I needed. Protege-Nrepl is a protege plugin which places a single menu item into the Protege frame. This then launches an internal Clojure process, which in turn launches a Nrepl socket. As it stands, Protege-Nrepl is not specific to Tawny — it simple provides a Clojure process. On the top of this, there is a small bridge package called Tawny-Protege which links together the data structures of Tawny, and Protege.

From a practical point-of-view, this means that I can launch protege, then connect to it from Emacs (or any other Clojure IDE). The IDE then operates in the same way as if Clojure were launched internally.

In theory, the process is very simple: I chose to implement the plugin itself in Java because this seemed easiest, not least because Protege provides a standard maven file to build plugins (initially, I used the older ant build, but the dependencies were a pain). Protege is an OSGI application; I have little knowledge of OSGI, so not having to work this part out was a relief. Java side the relevant code, looks like this:

RT.loadResourceScript("protege/dialog.clj");
RT.loadResourceScript("protege/nrepl.clj");

Var init = RT.var("protege.nrepl","init");
init.invoke();

// and later
Var newDialog = RT.var("protege.dialog", "new-dialog-panel");

Additionally there is some glue to implement the plugin interface, and some threading (loading Clojure in the paint thread is not a good idea). The protege.nrepl/init function loads a user config file, while protege.dialog/new-dialog-panel creates a GUI which starts the nrepl server.

That should be the process complete, but in my hands this failed; the problem is that OSGI requires me to pre-declare all the packages that I want to import within a bundle, so they get into the classpath. In this case, I included all the dependencies transitively anyway; the whole point of the plugin was to package Clojure up for Protege, so there was little point adding it independently. Protege classes (for the plugin) need to come from the protege environment, as do the OWL API classes, or I will not be able to manipulate objects created by protege with Tawny, as they would be different classes (of the same name, but different classloader).

For reasons that I could not determine, the OSGI manifest plugin also inserted a large number of dependency packages, including javax.servlet, junit, and some sun.misc classes; these are not available meaning that, even though they are not actually used, unless they are excluded specifically they make the plugin crash. All of this was achieved with the following modifications to the maven-bundle plugin.

<instructions>
  <Bundle-ClassPath>.</Bundle-ClassPath>
  <Bundle-SymbolicName>${project.artifactId};singleton:=true</Bundle-SymbolicName>
  <Bundle-Vendor>Phil Lord</Bundle-Vendor>

  <!-- We exclude a bunch of things here which otherwise get
   into the import list and are not provided from anywhere. How
   do they get there? No idea! -->
   <Import-Package>
     !javax.servlet*,!junit.*,!org.junit*,!org.apache.*,
     !org.testng.*,!sun.misc.*,*
   </Import-Package>
   <Include-Resource>plugin.xml,{maven-resources}</Include-Resource>
   <Embed-Transitive>true</Embed-Transitive>
   <Embed-Dependency>*;scope=compile</Embed-Dependency>
   <Require-Bundle>
     org.protege.editor.core.application,
     org.protege.editor.owl,
     org.semanticweb.owl.owlapi
   </Require-Bundle>
</instructions>

On the clojure side, the final addition was Pomegranate; enabling Clojure in Protege is fairly useless without being able to add new dependencies (such as Tawny!), but I did not want to add these to the maven build. Pomegranate allows me to add new dependencies on the fly.

As I always use Tawny, I add the following to ~/.protege-nrepl/init.clj so that it is alongside Protege. I may change this so it happens automatically; if anyone wanted to use protege-nrepl without Tawny they could still do so.

(ns init
  (:require
   [cemerick.pomegranate]
   [protege model nrepl]))

;; force loading of tawny
(cemerick.pomegranate/add-dependencies
 :coordinates '[[uk.org.russet/tawny-protege "1.1.0-SNAPSHOT"]]
 :repositories (merge cemerick.pomegranate.aether/maven-central
                                          {"clojars" "http://clojars.org/repo"}))
;; and monkey patch the thing
(require 'tawny.protege-nrepl)

;; initing the dialog takes ages -- so auto connect
(dosync (ref-set protege.model/auto-connect-on-default true))

Lein-Sync

When launched from within Protege, the Clojure process will be running independently of a Maven or leiningen project. If, for example, I try and load the tawny.pizza/pizza, clojure will fail as it cannot find the local resources, nor any dependencies.

To handle this situation, I have created lein-sync — this is a leiningen plugin which is run in the project directory, which creates a .sync.clj file which contains all the Pomegranate code needed to extend the local classpath. For instance, this file generated for the tawny.pizza looks like this:

;; This file is auto-generated by lein sync
(require 'cemerick.pomegranate)
(cemerick.pomegranate/add-dependencies
 :coordinates
 '[[uk.org.russet/tawny-owl "1.0-SNAPSHOT"]
   [org.clojure/tools.nrepl
    "0.2.3"
    :exclusions
    ([org.clojure/clojure])]
   [clojure-complete/clojure-complete
    "0.2.3"
    :exclusions
    ([org.clojure/clojure])]
   [ritz/ritz-nrepl-middleware "0.7.0"]
   [org.clojure/tools.trace "0.7.5"]
   [compliment/compliment "0.0.1"]]
 :repositories
 '[["central"
    {:snapshots false, :url "http://repo1.maven.org/maven2/"}]
   ["clojars" {:url "https://clojars.org/repo/"}]])
(cemerick.pomegranate/add-classpath
 "/home/phillord/src/knowledge/ontology-clj/tawny-pizza/src")
(cemerick.pomegranate/add-classpath
 "/home/phillord/src/knowledge/ontology-clj/tawny-pizza/dev-resources")
(cemerick.pomegranate/add-classpath
 "/home/phillord/src/knowledge/ontology-clj/tawny-pizza/resources")
(.println System/out "Loaded .sync in pizza")

Some of these dependencies (compliment, tools.trace) come from my local leiningen configuration. Loading this file, ensures an nrepl launched from within Protege behaves in the same way as a locally launched nrepl. Currently, classpath extension uses fully qualified paths which obviously requires the same (or a shared) file system between the leiningen instance generating .sync.clj and Protege; I may address this latter as it would enable me to run Protege on a different machine from the IDE.

Finally, I have written some Emacs to connect to the nrepl server and automatically run .sync.clj on connection; adding something similar for other IDEs would be straight-forward, although manual use of the repl is also possible.


Conclusions

Given all the availability of the tools, conceptually building protege-nrepl was straight-forward. In practice, it was made somewhat more complex through a combination of ClassLoaders, OSGI and the need to dynamically extend the classpath in a running JVM. In particular, my experience of running OSGI has not been positive; I spent a substantial amount of time chasing down a very strange bug caused by an inconsistency between the OWL API and Protege. Combined with the strange behaviour of the maven plugin which I only solved by multiple trial and error restarts, it all added a lot of complexity. Currently, I am using a pre-release version of Protege as this has been ported to maven; this requires a local build which I realize is not an end-user experience.

The end product, however, was worth the effort. Despite my criticisms of Protege, it remains an excellent tool; having a running Protege, updating live is a considerable advance over the old “save and reload” workflow that I used previously. I look forward to the next release of Protege, as this use of Tawny-OWL, protege-nrepl and Protege will increase the attractiveness of Tawny considerably.

Bibliography