Tawny-OWL (n.d.a) enables a rich programmatic interface to OWL and ontology building. To an extent, I wrote Tawny because I wanted to get away from the use of Protege (n.d.b/) as an ontology editor. I compare the experience of Protege to Tawny as similar to a comparison between Excel and R; if the former does what you need, then it’s fine, but it’s hard to extend. So, it is with Tawny — it is simple to add patterns, new syntaxes, new capabilities. And I have access to all the standard tools that I expect with any programmatic environment; I can use versioning, build tools and test harnesses.

Having said all of this, Tawny-OWL comes with some cost. Although most IDEs have good capabilities for jumping to definitions and the like, they are limited compared to the display capabilities of Protege (n.d.b/) the ability to navigate quickly and rapidly through an ontology, to use tools like OWLViz to get a broad overview of the ontology structure.

Even if I feel that Protege is limited as an editor, I would still like to use its visualisation capabilities; it is unfortunate if, in choosing Tawny-OWL, I have to abandon Protege. This is not, however, necessary. It is possible to use Protege to visualise an ontology created by Tawny with synchronisation; changes are displayed by Protege immediately, as they are displaying the live data models that Tawny is manipulating. This is achieved by Protege-Nrepl; in this post, I describe the implementation behind it.

Background

Tawny is implemented in Clojure which is a lisp that compiles down to Java bytecodes; the OWL functionality comes from the OWL API which is the same API that Protege uses. In an abstract sense, then it should be possible to plug the two together; to have Tawny operate over the same data structures that Protege is displaying.

There are a number of ways to connect a Clojure process to an IDE, but the most common way is with a relatively recent tool called nrepl. This is a protocol and an tool implementing this protocol which allows communication with a Clojure process. There are now quite a few tools which have implemented clients to this protocol.

Protege-Nrepl

I was fortunate that Clojure provided most of the tools that I needed. Protege-Nrepl is a protege plugin which places a single menu item into the Protege frame. This then launches an internal Clojure process, which in turn launches a Nrepl socket. As it stands, Protege-Nrepl is not specific to Tawny — it simple provides a Clojure process. On the top of this, there is a small bridge package called Tawny-Protege which links together the data structures of Tawny, and Protege.

From a practical point-of-view, this means that I can launch protege, then connect to it from Emacs (or any other Clojure IDE). The IDE then operates in the same way as if Clojure were launched internally.

In theory, the process is very simple: I chose to implement the plugin itself in Java because this seemed easiest, not least because Protege provides a standard maven file to build plugins (initially, I used the older ant build, but the dependencies were a pain). Protege is an OSGI application; I have little knowledge of OSGI, so not having to work this part out was a relief. Java side the relevant code, looks like this:

RT.loadResourceScript("protege/dialog.clj");
RT.loadResourceScript("protege/nrepl.clj");

Var init = RT.var("protege.nrepl","init");
init.invoke();

// and later
Var newDialog = RT.var("protege.dialog", "new-dialog-panel");

Additionally there is some glue to implement the plugin interface, and some threading (loading Clojure in the paint thread is not a good idea). The protege.nrepl/init function loads a user config file, while protege.dialog/new-dialog-panel creates a GUI which starts the nrepl server.

That should be the process complete, but in my hands this failed; the problem is that OSGI requires me to pre-declare all the packages that I want to import within a bundle, so they get into the classpath. In this case, I included all the dependencies transitively anyway; the whole point of the plugin was to package Clojure up for Protege, so there was little point adding it independently. Protege classes (for the plugin) need to come from the protege environment, as do the OWL API classes, or I will not be able to manipulate objects created by protege with Tawny, as they would be different classes (of the same name, but different classloader).

For reasons that I could not determine, the OSGI manifest plugin also inserted a large number of dependency packages, including javax.servlet, junit, and some sun.misc classes; these are not available meaning that, even though they are not actually used, unless they are excluded specifically they make the plugin crash. All of this was achieved with the following modifications to the maven-bundle plugin.

<instructions>
  <Bundle-ClassPath>.</Bundle-ClassPath>
  <Bundle-SymbolicName>${project.artifactId};singleton:=true</Bundle-SymbolicName>
  <Bundle-Vendor>Phil Lord</Bundle-Vendor>

  <!-- We exclude a bunch of things here which otherwise get
   into the import list and are not provided from anywhere. How
   do they get there? No idea! -->
   <Import-Package>
     !javax.servlet*,!junit.*,!org.junit*,!org.apache.*,
     !org.testng.*,!sun.misc.*,*
   </Import-Package>
   <Include-Resource>plugin.xml,{maven-resources}</Include-Resource>
   <Embed-Transitive>true</Embed-Transitive>
   <Embed-Dependency>*;scope=compile</Embed-Dependency>
   <Require-Bundle>
     org.protege.editor.core.application,
     org.protege.editor.owl,
     org.semanticweb.owl.owlapi
   </Require-Bundle>
</instructions>

On the clojure side, the final addition was Pomegranate; enabling Clojure in Protege is fairly useless without being able to add new dependencies (such as Tawny!), but I did not want to add these to the maven build. Pomegranate allows me to add new dependencies on the fly.

As I always use Tawny, I add the following to ~/.protege-nrepl/init.clj so that it is alongside Protege. I may change this so it happens automatically; if anyone wanted to use protege-nrepl without Tawny they could still do so.

(ns init
  (:require
   [cemerick.pomegranate]
   [protege model nrepl]))

;; force loading of tawny
(cemerick.pomegranate/add-dependencies
 :coordinates '[[uk.org.russet/tawny-protege "1.1.0-SNAPSHOT"]]
 :repositories (merge cemerick.pomegranate.aether/maven-central
                                          {"clojars" "http://clojars.org/repo"}))
;; and monkey patch the thing
(require 'tawny.protege-nrepl)

;; initing the dialog takes ages -- so auto connect
(dosync (ref-set protege.model/auto-connect-on-default true))

Lein-Sync

When launched from within Protege, the Clojure process will be running independently of a Maven or leiningen project. If, for example, I try and load the tawny.pizza/pizza, clojure will fail as it cannot find the local resources, nor any dependencies.

To handle this situation, I have created lein-sync — this is a leiningen plugin which is run in the project directory, which creates a .sync.clj file which contains all the Pomegranate code needed to extend the local classpath. For instance, this file generated for the tawny.pizza looks like this:

;; This file is auto-generated by lein sync
(require 'cemerick.pomegranate)
(cemerick.pomegranate/add-dependencies
 :coordinates
 '[[uk.org.russet/tawny-owl "1.0-SNAPSHOT"]
   [org.clojure/tools.nrepl
    "0.2.3"
    :exclusions
    ([org.clojure/clojure])]
   [clojure-complete/clojure-complete
    "0.2.3"
    :exclusions
    ([org.clojure/clojure])]
   [ritz/ritz-nrepl-middleware "0.7.0"]
   [org.clojure/tools.trace "0.7.5"]
   [compliment/compliment "0.0.1"]]
 :repositories
 '[["central"
    {:snapshots false, :url "http://repo1.maven.org/maven2/"}]
   ["clojars" {:url "https://clojars.org/repo/"}]])
(cemerick.pomegranate/add-classpath
 "/home/phillord/src/knowledge/ontology-clj/tawny-pizza/src")
(cemerick.pomegranate/add-classpath
 "/home/phillord/src/knowledge/ontology-clj/tawny-pizza/dev-resources")
(cemerick.pomegranate/add-classpath
 "/home/phillord/src/knowledge/ontology-clj/tawny-pizza/resources")
(.println System/out "Loaded .sync in pizza")

Some of these dependencies (compliment, tools.trace) come from my local leiningen configuration. Loading this file, ensures an nrepl launched from within Protege behaves in the same way as a locally launched nrepl. Currently, classpath extension uses fully qualified paths which obviously requires the same (or a shared) file system between the leiningen instance generating .sync.clj and Protege; I may address this latter as it would enable me to run Protege on a different machine from the IDE.

Finally, I have written some Emacs to connect to the nrepl server and automatically run .sync.clj on connection; adding something similar for other IDEs would be straight-forward, although manual use of the repl is also possible.

Conclusions

Given all the availability of the tools, conceptually building protege-nrepl was straight-forward. In practice, it was made somewhat more complex through a combination of ClassLoaders, OSGI and the need to dynamically extend the classpath in a running JVM. In particular, my experience of running OSGI has not been positive; I spent a substantial amount of time chasing down a very strange bug caused by an inconsistency between the OWL API and Protege. Combined with the strange behaviour of the maven plugin which I only solved by multiple trial and error restarts, it all added a lot of complexity. Currently, I am using a pre-release version of Protege as this has been ported to maven; this requires a local build which I realize is not an end-user experience.

The end product, however, was worth the effort. Despite my criticisms of Protege, it remains an excellent tool; having a running Protege, updating live is a considerable advance over the old “save and reload” workflow that I used previously. I look forward to the next release of Protege, as this use of Tawny-OWL, protege-nrepl and Protege will increase the attractiveness of Tawny considerably.