I was lucky enough to catch (er) the premier of Catch-22 at the Northern Stage on Saturday. I’ve been to quite a few shows there now, and they are generally good; an adaption of what I consider to be best book that I’ve ever read, I was optimistic. We had good seats as well, middle, fifth row (far enough away not to get a stiff neck, close enough to hear clearly).

Looking through the programme, I was confused as to who had made the adaption as it wasn’t not mentioned anywhere; fortunately, BBC News put me right; the play was my Joseph Heller himself.

The stage set was fantastic, a cut-away bomber with the back-end merging into a beach hut. Like the rest of the set, it was used for many purposes — as a plane, an office, an entrance way. The cast was the same; nine actors jumping backward and forward between roles, except for the actor playing Yossarian, who, as in the book, was a solitary figure in the middle of the madness.

On the whole, I think it worked rather well. It’s a mistake, I think, to compare it to the book directly; nothing ever could. Many fantastic parts were missed — including my own favorite great loyality campaign, and the shortening meant that only a few characters really came out of their own: Colonel Cathcart, Major Major, the Chaplain, Natley (and his whore). Not all of it, I think, made entire sense: the naked man in the tree was funny, but it wasn’t clear why Yossarian refused to wear his clothes; nor the ending, with Orr’s escape missing, it’s not clear why Yossarian got all optimistic. But how could there not be parts missing? The main thing is that the feel of the theatre show and the book are the same; it’s confused, dissonate, unsettling, challenging all at the same time as being very, very funny.

The BBC news article raises the question, in the 40 years since the play was written why has it not been performed more. Good question, indeed.

Tawny-OWL (http://www.russet.org.uk/blog/2366) enables a rich programmatic interface to OWL and ontology building. To an extent, I wrote Tawny because I wanted to get away from the use of Protege (http://protege.stanford.edu/) as an ontology editor. I compare the experience of Protege to Tawny as similar to a comparison between Excel and R; if the former does what you need, then it’s fine, but it’s hard to extend. So, it is with Tawny — it is simple to add patterns, new syntaxes, new capabilities. And I have access to all the standard tools that I expect with any programmatic environment; I can use versioning, build tools and test harnesses.

Having said all of this, Tawny-OWL comes with some cost. Although most IDEs have good capabilities for jumping to definitions and the like, they are limited compared to the display capabilities of Protege (http://protege.stanford.edu/); the ability to navigate quickly and rapidly through an ontology, to use tools like OWLViz to get a broad overview of the ontology structure.

Even if I feel that Protege is limited as an editor, I would still like to use its visualisation capabilities; it is unfortunate if, in choosing Tawny-OWL, I have to abandon Protege. This is not, however, necessary. It is possible to use Protege to visualise an ontology created by Tawny with synchronisation; changes are displayed by Protege immediately, as they are displaying the live data models that Tawny is manipulating. This is achieved by Protege-Nrepl; in this post, I describe the implementation behind it.


Background

Tawny is implemented in Clojure which is a lisp that compiles down to Java bytecodes; the OWL functionality comes from the OWL API which is the same API that Protege uses. In an abstract sense, then it should be possible to plug the two together; to have Tawny operate over the same data structures that Protege is displaying.

There are a number of ways to connect a Clojure process to an IDE, but the most common way is with a relatively recent tool called nrepl. This is a protocol and an tool implementing this protocol which allows communication with a Clojure process. There are now quite a few tools which have implemented clients to this protocol.


Protege-Nrepl

I was fortunate that Clojure provided most of the tools that I needed. Protege-Nrepl is a protege plugin which places a single menu item into the Protege frame. This then launches an internal Clojure process, which in turn launches a Nrepl socket. As it stands, Protege-Nrepl is not specific to Tawny — it simple provides a Clojure process. On the top of this, there is a small bridge package called Tawny-Protege which links together the data structures of Tawny, and Protege.

From a practical point-of-view, this means that I can launch protege, then connect to it from Emacs (or any other Clojure IDE). The IDE then operates in the same way as if Clojure were launched internally.

In theory, the process is very simple: I chose to implement the plugin itself in Java because this seemed easiest, not least because Protege provides a standard maven file to build plugins (initially, I used the older ant build, but the dependencies were a pain). Protege is an OSGI application; I have little knowledge of OSGI, so not having to work this part out was a relief. Java side the relevant code, looks like this:

RT.loadResourceScript("protege/dialog.clj");
RT.loadResourceScript("protege/nrepl.clj");

Var init = RT.var("protege.nrepl","init");
init.invoke();

// and later
Var newDialog = RT.var("protege.dialog", "new-dialog-panel");

Additionally there is some glue to implement the plugin interface, and some threading (loading Clojure in the paint thread is not a good idea). The protege.nrepl/init function loads a user config file, while protege.dialog/new-dialog-panel creates a GUI which starts the nrepl server.

That should be the process complete, but in my hands this failed; the problem is that OSGI requires me to pre-declare all the packages that I want to import within a bundle, so they get into the classpath. In this case, I included all the dependencies transitively anyway; the whole point of the plugin was to package Clojure up for Protege, so there was little point adding it independently. Protege classes (for the plugin) need to come from the protege environment, as do the OWL API classes, or I will not be able to manipulate objects created by protege with Tawny, as they would be different classes (of the same name, but different classloader).

For reasons that I could not determine, the OSGI manifest plugin also inserted a large number of dependency packages, including javax.servlet, junit, and some sun.misc classes; these are not available meaning that, even though they are not actually used, unless they are excluded specifically they make the plugin crash. All of this was achieved with the following modifications to the maven-bundle plugin.

<instructions>
  <Bundle-ClassPath>.</Bundle-ClassPath>
  <Bundle-SymbolicName>${project.artifactId};singleton:=true</Bundle-SymbolicName>
  <Bundle-Vendor>Phil Lord</Bundle-Vendor>

  <!-- We exclude a bunch of things here which otherwise get
   into the import list and are not provided from anywhere. How
   do they get there? No idea! -->
   <Import-Package>
     !javax.servlet*,!junit.*,!org.junit*,!org.apache.*,
     !org.testng.*,!sun.misc.*,*
   </Import-Package>
   <Include-Resource>plugin.xml,{maven-resources}</Include-Resource>
   <Embed-Transitive>true</Embed-Transitive>
   <Embed-Dependency>*;scope=compile</Embed-Dependency>
   <Require-Bundle>
     org.protege.editor.core.application,
     org.protege.editor.owl,
     org.semanticweb.owl.owlapi
   </Require-Bundle>
</instructions>

On the clojure side, the final addition was Pomegranate; enabling Clojure in Protege is fairly useless without being able to add new dependencies (such as Tawny!), but I did not want to add these to the maven build. Pomegranate allows me to add new dependencies on the fly.

As I always use Tawny, I add the following to ~/.protege-nrepl/init.clj so that it is alongside Protege. I may change this so it happens automatically; if anyone wanted to use protege-nrepl without Tawny they could still do so.

(ns init
  (:require
   [cemerick.pomegranate]
   [protege model nrepl]))

;; force loading of tawny
(cemerick.pomegranate/add-dependencies
 :coordinates '[[uk.org.russet/tawny-protege "1.1.0-SNAPSHOT"]]
 :repositories (merge cemerick.pomegranate.aether/maven-central
                                          {"clojars" "http://clojars.org/repo"}))
;; and monkey patch the thing
(require 'tawny.protege-nrepl)

;; initing the dialog takes ages -- so auto connect
(dosync (ref-set protege.model/auto-connect-on-default true))

Lein-Sync

When launched from within Protege, the Clojure process will be running independently of a Maven or leiningen project. If, for example, I try and load the tawny.pizza/pizza, clojure will fail as it cannot find the local resources, nor any dependencies.

To handle this situation, I have created lein-sync — this is a leiningen plugin which is run in the project directory, which creates a .sync.clj file which contains all the Pomegranate code needed to extend the local classpath. For instance, this file generated for the tawny.pizza looks like this:

;; This file is auto-generated by lein sync
(require 'cemerick.pomegranate)
(cemerick.pomegranate/add-dependencies
 :coordinates
 '[[uk.org.russet/tawny-owl "1.0-SNAPSHOT"]
   [org.clojure/tools.nrepl
    "0.2.3"
    :exclusions
    ([org.clojure/clojure])]
   [clojure-complete/clojure-complete
    "0.2.3"
    :exclusions
    ([org.clojure/clojure])]
   [ritz/ritz-nrepl-middleware "0.7.0"]
   [org.clojure/tools.trace "0.7.5"]
   [compliment/compliment "0.0.1"]]
 :repositories
 '[["central"
    {:snapshots false, :url "http://repo1.maven.org/maven2/"}]
   ["clojars" {:url "https://clojars.org/repo/"}]])
(cemerick.pomegranate/add-classpath
 "/home/phillord/src/knowledge/ontology-clj/tawny-pizza/src")
(cemerick.pomegranate/add-classpath
 "/home/phillord/src/knowledge/ontology-clj/tawny-pizza/dev-resources")
(cemerick.pomegranate/add-classpath
 "/home/phillord/src/knowledge/ontology-clj/tawny-pizza/resources")
(.println System/out "Loaded .sync in pizza")

Some of these dependencies (compliment, tools.trace) come from my local leiningen configuration. Loading this file, ensures an nrepl launched from within Protege behaves in the same way as a locally launched nrepl. Currently, classpath extension uses fully qualified paths which obviously requires the same (or a shared) file system between the leiningen instance generating .sync.clj and Protege; I may address this latter as it would enable me to run Protege on a different machine from the IDE.

Finally, I have written some Emacs to connect to the nrepl server and automatically run .sync.clj on connection; adding something similar for other IDEs would be straight-forward, although manual use of the repl is also possible.


Conclusions

Given all the availability of the tools, conceptually building protege-nrepl was straight-forward. In practice, it was made somewhat more complex through a combination of ClassLoaders, OSGI and the need to dynamically extend the classpath in a running JVM. In particular, my experience of running OSGI has not been positive; I spent a substantial amount of time chasing down a very strange bug caused by an inconsistency between the OWL API and Protege. Combined with the strange behaviour of the maven plugin which I only solved by multiple trial and error restarts, it all added a lot of complexity. Currently, I am using a pre-release version of Protege as this has been ported to maven; this requires a local build which I realize is not an end-user experience.

The end product, however, was worth the effort. Despite my criticisms of Protege, it remains an excellent tool; having a running Protege, updating live is a considerable advance over the old “save and reload” workflow that I used previously. I look forward to the next release of Protege, as this use of Tawny-OWL, protege-nrepl and Protege will increase the attractiveness of Tawny considerably.

Bibliography

Literate programming comes in many forms and disguises but is essentially the notion that the documentation and programmatic code should be written together, so that the documentation supports the code and vice versa. In this post, I discuss some of the problems with literate programming, my early attempts to circumvent these with respect to ontology development. Finally, I finish up with a description of some new technology which, I think, offers a solution.


Literate Programming for Ontologies

The reality is, I think, that literate programming has never really take off; there are a large number of reasons for this of course. Code does not naturally have an linear narrative and is not necessarily read in this way: rather, when read by an experienced programmer, they often track the flow of execution through the code (http://synesthesiam.com/posts/modeling-how-programmers-read-code.html). A secondary problem is apparently quite trivial but the editing environment for literate programmes tends to be poor. I cannot find any good research on this, but this is both my experience and that of others (http://unspecified.wordpress.com/2010/06/04/literate-programming-is-a-terrible-idea).

For ontology development, I think a literate approach seems to make more sense. Again, in my experience, ontologies do have a somewhat more narrative approach than code — at least in the sense that the lack loops and the like.


Initial Approaches

I have now been experimenting with literate techniques since 2009. The first version used a single latex file, and pulled these out into a Manchester syntax file (http://www.russet.org.uk/blog/1213). This worked quite nicely but suffered from the poor editor problem: I was building ontologies embedded in LaTeX, so lacked even the basic features (such as syntax highlighting) that I got when editing Manchester syntax files directly. This was a problem even with the very limited feature set from tools like omn-mode.el. The disadvantage would have been worse if I had been used to a richer environment for Manchester syntax.

My second attempt was took the opposite approach; now I used two files — a Manchester syntax file and a LaTeX one with a method for referring between the two (http://www.russet.org.uk/blog/1258). This worked okay but had a poor implementation which I later refined (http://www.russet.org.uk/blog/1269).

These approaches have their advantages but do both suffer from a poor editing environment; either in having two files to switch and link between, or favouring documentation over ontology or vice versa. They also suffered from a secondary issue, which is that they are based around Manchester syntax. While this is nice enough, since writing Tawny-OWL (1303.0213), this style of ontology development just feels not rich enough.


Marginalia

One of the declared advantages of using a real programming language as the basic for Tawny-OWL was the ability to use the tools from that language; I have used a number of these both within Tawny-OWL and with ontologies written with Tawny: mostly obviously, the test environment, but also serialisation, properties support and, of course, the entire editing environment.

This raises the question as to whether I could use literate programming tools from Clojure as well. To my knowledge, the only real option in town here is Marginalia. Marginalia uses markdown as the documentation format and builds a nice presentation with code on one side, and comments on the other.

However, it has problems. Firstly, it presents all comments as text — you cannot comment the comments as it were which is irritating for boilerplate such as licence text. Secondly, the side-by-side presentation breaks the flow of reading as you have to move your eyes around the screen all the time. And, finally, it’s Markdown. While Markdown is nice at what it does, it’s very limited, and I missed the extra power of something like LaTeX.

The main difficulty, though, remains the editing environment. Without special support, while editing the comments show up as just comments. I can never remember the order of brackets in Markdown links — I rely on syntax highlighting to tell me that I have it correct.

Is there a way that Clojure and LaTeX can be made to work together?


LaTeX experiments: line comments

My first thought, in experimenting with LaTeX was a remarkably cheap and cheerful one. Consider a document such as this:

;; \documentclass{article}
;; \begin{document}
;; \begin{code}
(println "hello world")
;; \end{code}
;; \end{document}

This is a valid Clojure file, and is nearly valid latex as well. The only illegal part is that ;; occurs before the documentclass macro, although, in practice having ;; appear randomly throughout the document would not be ideal either.

Now, LaTeX as an embedded markup language has a very plastic syntax, and I have used that in this case. It is actually very easy to just ignore the ; character entirely, through the use of Catcodes; we can put this into a driver file which then inputs our Clojure file like so:

\catcode`;=9
\input{file.clj}

This way we maintain the validity of our Clojure file (otherwise the first line would be illegal). This is a remarkably cheap and cheerful way of achieving our aims; albeit at the cost of losing the ability to use semi-colons in our writing.


Indirect-buffers

What, however, about the editing environment. My own preferred environment — Emacs — has nice modes which edit both LaTeX and Clojure code, and it is possible to switch between the two, when I want to move between editing code and editing documentation. This is quite clunky, but there is a second option which is “indirect-buffers”. This is a piece of Emacs arcana where two buffers share some of the same data structures but not all, which means that they can have different modes. Unfortunately, my experience is that the buffers share too much — as well as the text, they also share “text-properties” which unfortunately both LaTeX and Clojure mode use. In practice, this means syntax highlighting fails (or rather than two representations fight with each other). As a second problem, although the file is valid LaTeX it is not normal LaTeX; simply things like wrapping text in paragraphs fails because of the ;; comments at the beginning of each line.

So, this experiment fails the editing environment test.


LaTeX experiments: block comments

My next attempt was to use block comments. Consider this file which is valid lisp using #| and |# block comments.

#|
\documentclass{article}
\begin{document}
\begin{code}
|#
(println "hello world")
#|
\end{code}
\end{document}
|#

We can use a similar (but not identical) trick with catcodes to make this valid latex also:

\catcode`#=\active
\def#{\catcode`#=6}
\catcode`|=\active
\def|{\catcode`|=12}
\input{hello_world.lisp}

The first call makes the # character active — that is, definable as a macro. We then define # as a macro which will set the catcode of # to 6 (which is it’s default). Then, we do the same with |. The practical upshot of this is that the opening #| does nothing other than reset everything in the driver file; effectively it’s ignored.

This actually works quite nicely in the editing environment; the opening #| effectively makes no difference to Emacs, and the mode works well. The only real disadvantage is that every code block needs two delimiters — one to open the code block in latex, and end the comment in Lisp.

Now there are various multi-mode tools around for Emacs which should help solve the otherwise clunky editing environment, although even here I am not convinced that this is the right route. Multi-mode tools are complex and to some extent are not what I want — when editing code I want to suppress the documentation, give it a low visual immediacy, and when editing documentation, I want the reverse.

There is, however, a bigger problem — while the last example is valid Common Lisp, Clojure does not have block comments, nor does the programmer have the ability to extend the reader in this way. So, while this seems a nice solution, it depends on a specific language feature which Clojure lacks.


Emacs Experiments: formats

My next idea was to use formats. Emacs allows transformations to happen between the text that is visualised on screen, and how it is saved to file. The main reason for this is to support the many non-ascii text formats that exist. But it is (perhaps unsurprisingly) fully extensible within Emacs and could be used for any purpose. So, why not convert line-commented Clojure on file into block-comments on screen; this will give editable latex on screen and valid Clojure on file and a driver file to give valid Latex on file also.

Unfortunately, it fails. While Emacs latex support is file based, Clojure (and specifically cider) has a tighter integration; it can communicate the contents of a buffer without saving to file. This circumvents the formatting — the block comments are sent to Clojure process which complains.


Emacs Experiments: linked-buffer

I am now experimenting with another option. indirect-buffers place exactly the same text (and text-properties) into two buffers. Instead of sharing all the text, why not have two buffers with a function that can transform the text bi-directionally between the two. The practical result is two views over the same content. Surprisingly, this works pretty well, as you can see here even though my current implementation is very simple — the whole buffer is copied every keypress. We could achieve the same thing with indirect-buffers, but as well as simple copying, however, we can also transform the text on the fly so that both buffers are valid for their respective modes.

The broad idea is not that new — it’s similar to web/weave or SWeave for instance, except that it is embedded into the editor; this means we can take advantage of existing support for both languages from the editor and the author gets immediate feedback about the transformation — so messing up the syntax is pretty obvious.

It also provides a superset of functionality provided by other techniques: indirect-buffers as mentioned previously, shadowfile.el (which creates a second copy of a file somewhere else on every save), and it could also mimic shadow.el which generates a secondary file by a command invocation on every save (although an invocation of an external command every keypress would probably not be performant).

The first release of linked-buffer was a month ago. I am currently unhappy with the configuration and will change this so code is in flux at the moment, but I am using it in anger which is a good sign. Currently, it does a latex <-> clojure transformation, but I will add a few more as time goes on.


Discussion

It has taken me quite a while to get to this stage, and a number of experiments along the way, but my feeling is that I now have a workable literate environment. It also validates my decision to build Tawny (http://www.russet.org.uk/blog/2962). Having a rich textual language for building ontologies is a bit of a game changer; providing programmatic extensions to the language has been helpful, but the access to other tools, git, travis, tests and a repl has really made the difference. Now adding a literate environment to this as well changes the way that I can use ontologies and is a paradigm shift in their development.

Bibliography

My inaugural (http://www.russet.org.uk/blog/2968) book-review is for Clojure High Performance Programming, by Shantanu Kumar.

Unfortunately, for my first review, I cannot be that positive about the book. I found it rather disorganised and chaotic. Concepts are introduced and then briefly discussed, occasionally cross-referenced later. It is often not clear what the relevance of this to programming in Clojure. For instance, we are introduced to branch prediction in modern processors. Not something I know about, so perhaps useful to understand. But it’s not explained why this would be useful to know about. Are there any code examples that show how branch prediction can impact on the performance of my code? Likewise, different forms of CPU interconnect. Or L1,2 and 3 caches. I have the general impression that, as a Clojure programming I am a long way from the CPU; there is not really any sample code given showing how the size of the caches can really impact on my performance.

Worse those issues which can impact on the Clojure programmer are scantily covered. For example, the following (decompiled) java code is shown:

public Object invoke(Object x, Object y){
       x = null;
       y = null;
       return Numbers.multiply(x,y);
}

Partly, this demonstrates auto-boxing, but as a Java programmer, the code makes no sense, as it calls Numbers.multiple(null,null). It’s never explained how or why this makes sense (Clojure is clearing locals, something which works in byte-code, but cannot be translated into Java source). Type hinting (which I have just had the joy of adding to tawny-owl) is similarly dealt with in a little under 2 pages, despite having a potentially large impact on the performance of (some) Clojure libraries.

In short, as a series of vignettes about different aspects of performance it’s interesting enough; but the whole is no greater than the parts, and it left me with little increased knowlege of Clojure, nor how to make it perform well.

Bibliography

This year I have been on a bit of a mission. I decided that having being here for 8 years, I would actually use the library. So I have started off by requesting books and reading them. It’s been a while since I have regularly read books and it’s been quite an interesting experience. I’ve remembered that reading tech books is quite a reflective process, away from the computer. It’s a less stressful, although perhaps more time consuming experience than hunting through the web, reading documentation or code until you understand what ever it is you are reading about.

I don’t know how long this trend will continue, but while it is, I thought I would write some short book reviews on the books that I have read; as normal, mostly for my own purposes; like many lecturers I get asked for book recommendations, so recording my impressions seems sensible.

I am please to announce the first full release of Tawny-OWL, my library for fully programmatic development of OWL ontologies. The library now has a fairly large feature set:

  • Complete support for OWL2

  • Integrated support for reasoning with HermiT or ELK

  • Profile checking

  • Fixtures and support macros for unit testing

  • Use of external ontologies available only as OWL files

  • Rendering of OWL API objects to Tawny code.

  • Support for generating and using ontologies with numeric IDs.

  • Support for multilingual labels.

Additionally, I now have initial integration with Protege, described later.

The library is now available from clojars or on github.

Feedback is welcome at tawny-owl@googlegroups.com.

  • Background

A little over a year ago, I first described my experiments with building a programmatic environment for ontology construction (http://www.russet.org.uk/blog/2214). The need arose out of frustration with existing ontology tools; Protege, for example, provides a nice graphical environment, but it has many limitations. It does not easily allow automated generation of ontology entities, for example, and it also does not provide access to tools which are common place in an IDE: versioning tools, diffs, test cases and so forth. While ontology specific variations of these tools do exist, they were not as good as the ones I was used to use when programming.

Tawny seeks to bridge this gap, by using a full programmatic environment to generate OWL ontologies. I chose Clojure because of its syntactic plasticity; at its simplest, when using tawny, it does not feel like a programming language, just a syntax and evaluation engine for writing ontologies. However, the full power of the programming language is there and can be used when necessary (http://www.russet.org.uk/blog/2366).

Since the first blog post, there have now been a further 8, as well as three papers, describing tawny itself (http://www.russet.org.uk/blog/2366), the karyotype ontology (1305.3758) and our use of patterns, higher-level abstractions within the karyotype ontology and applied to SIO. From an initial experiment, tawny has become a useful tool which we are using on a daily basis.

  • In Early Release

Included in this release is our initial integration with Protege. Tawny builds on the OWL API, which is also the basis for Protege. I always assumed that Protege would be used to view OWL files generated by Tawny, but it is actually possible to integrate them much more comprehensively than this. It is now possible to directly manipulate the data structures of Protege using Tawny; in short, Protege can display what ever tawny has generated immediately, and without a file in between.

We have achieved this in two ways; firstly with protege-tawny which provides a command line environment directly inside Protege. This is useful, but does not provide the rich programmatic IDE that I want. However, the protege-nrepl environment allows exactly this; protege launches Clojure, and launches a NREPL server to which you connect with Emacs, Eclipse or any of the other Clojure IDEs. Finally, lein-sync allows syncing classpaths and dependencies with an existing Clojure project. The practical upshot can be seen in screencast; tawny can be used as normal, with Protege following.

Currently, Protege and Tawny use different versions of the OWL API, so while the protege-nrepl can be used with the current release of Protege, periodic crashes happen. In the meantime, a hand-built distribution of Protege is available including nrepl.

  • For the Future

I have three main aims for the next few releases of Tawny. First, we need to provide access to explanation code; currently, this has to be accessed within Protege, which is less than ideal for a process than can take many minutes to run. I wish to integrate this with the Clojure unit test environment so that explanations will be generated by failing test cases.

Second, tawny currently allows the development of ontologies, but does not allow easy querying over them. I have several possibilities here: including integration of a SPARQL engine; the current rendering engine, combined perhaps with core.match, or, finally, fully-fledged support for core.match directly.

Finally, I wish to experiment with and add support for connection points (http://www.russet.org.uk/blog/2955), to better enable modular ontology development.

Bibliography