Ontology Building with Emacs

I have just started to build an ontology and I have to admit that it has been a while since I have done this; I think that the last time was when writing a paper about function [1], so I was interested to see how it would work. I’ve have been engaged in discussions recently about syntactic aspects of OWL [2]; the main reason for this is my long-held believe of the need for editing tools that work at the syntactic level; this allows us to plug in to the enormous body of programming tools supporting building, collaborative development, versioning and so on. So, I decided to build the entire thing using Emacs; the nature of the ontology also meant that I wanted to reboot my long-neglected attempts to bring literate development to ontologies [3]. While it is not a large ontology I did manage 60 classes in an afternoon, so I am quite pleased with the results.

My basic working environment is as follows:

Emacs
for editing
omn-mode.el
providing basic OWL Manchester Syntax Support
pabbrev.el
dynamic and automatic abbreviation support
Protege
for viewing the ontology, and running reasoners

As an environment, this works quite well now. Although I have tried it before, Protege seems to work much better now when (mis)-used as a display environment. First, when loading a file into Protege it gives a report of errors, but has a nice “reload” button so that I can now fix the errors (or at least try to). Second, after an ontology has been loaded, Protege will now detect that the file has changed an offer to reload it. In general, it works quite well. There are still some issues — I have not had time to work this out reproducibly enough for a bug report — but there are times when Protege breaks, particularly when I change the file and break the syntax. I can live with this — while restarting Protege is slow in computer time with the Java loading, it doesn’t require lots of clicking to get back to where I started (“Open Recent”), so it’s quick for the user. Finally, and most importantly of all, Protege’s Manchester syntax parser now seems to support comment characters correctly — at least in my hands it treats “#” as a comment character.

Using Emacs as an editing environment over Manchester syntax has some considerable advantages over using Protege raw. I am very keyboard-centric while Protege is very mouse-centric; just moving backward and forward to class definitions with incremental search is much better than in Protege. Simple things like search and replace, especially with regexp just happens naturally in Emacs and there is no equivalent in Protege.

I wrote omn-mode.el a long time ago now; I don’t remember when although the last update to it’s original Subversion repository is from 2005 according to the :Date : inside the file. omn-mode is based on generic.el, and it is starting to get a little stretched for this now; I should move it to using the normal define-major-mode functionality. However, this does reflect what it is best at which is syntax highlighting. I had to fiddle with this a bit, to add support for some extra keywords and just make it more consistant. I am working on the basis that everything should be syntax-highlighted; although this makes Manchester Syntax files a little garish, it helps get the syntax correct. I also improved some of the regexps: so " some " has been changed to "\\<some\\>".

As Protege is now doing comments, I have added proper support for these also, although as a comment character “#” is a bit irritating, given that is also a valid part of a URL. So I fudged a bit here and used “# ” as a two character start. This means that multiple comment characters such as “##” which I use in Lisp is not going to work. But fixing the situation would be much, much harder. It seems to me that “#” is not an ideal choice; a lisp-like “;” would be better; I think, technically, you can find a “;” in a URL, but I have never seen one.

While the previous indentation engine (based purely on the previous line) worked surprisingly well, I have also improved this now. Actually Manchester syntax is surprisingly easy to indent reasonably well; an ontology is essentially a bag of axioms, so it has relatively little structure at the syntactic level, which means that my engine only uses three indentation levels.

Finally, I’ve also updated the mode to recognise both ":", and "_" as a Word constituant, for reasons that should become clear.

pabbrev.el is my own dynamic “as you type” abbreviation expansion package, and it is still the nicest thing that I have written for Emacs. I use it every day (and am using it now). I did notice one minor bug with it which was making is misbehave, but the main change comes from the update to omn-mode’s syntax table. pabbrev.el now expands prefixes and terms as a single word. For me, at least, this seems to work well. The change to the syntax table will also affect other dynamic abbreviation packages as well, including dabbrev and hippie-expand. Similarly underscore_separated_terms should expand.

The combination of these changes means that, for me anyway, Emacs is now quite a capable Manchester syntax editor. I have only really touched here on the things that I have written, but editing OWL ontologies at the textual level also really does open up many possibilities. Having an environment that I control is also useful. I would like to extend Manchester syntax to support semantics-free identifiers [2]. I can now make an initial implementation of this, using a pre-processsor to unwind the Alias: definitions to produce a “real” Manchester syntax file.

References

  1. P. Lord, "An evolutionary approach to Function", Journal of Biomedical Semantics, vol. 1, pp. S4, 2010. http://dx.doi.org/10.1186/2041-1480-1-S1-S4
  2. P. Lord, "Semantics-Free Ontologies", An Exercise in Irrelevance, 2012. http://www.russet.org.uk/blog/2040
  3. P. Lord, "Literate OMN", An Exercise in Irrelevance, 2009. http://www.russet.org.uk/blog/1213