Archive for the ‘Ontology’ Category

So, this year of Bio-Ontologies is upon me; I’m sitting in the airport waiting to fly in the wrong direction; although I’ve noticed that the airport signs no longer call this “waiting time” but “shopping time”.

It’s 12 years on now; I can’t remember whether this makes it the oldest SIG at ISMB, but it must be close. Perhaps it is surprising that a small meeting like this has lasted so long, but during it’s time the use of ontologies within biology has blossomed; to some extent, this is true of the outside world also. This year has carried on with the trend. Gone are the days that we used to get enough papers to fill the day, but no more; we’ve stretched the day out, we’ve added a poster session but still we get more. The number of attendees has gone up somewhat also. It’s good to see.

For me, bio-ontologies has also been the centre of my entry into the field; Edmonton was the first ontology paper that I ever presented — perhaps depressingly, still some of my best work. This year has special significance for me. I’m giving a paper myself for the first time since Edmonton; perhaps fitting to end off as I began, because this will also be my last year as conference chair. I’ve been involved now for 6 of the 12 years; while, I’ve enjoyed it and felt privileged to do the work, it’s enough. Organising is hard work, even now when I understand the process well. In the last few years, I’ve tried to push the workshop to be a bit broader than just ontologies, to take in all new forms and technologies for representing and distributing knowledge; I’ve met with some, but limited success. A workshop with a 12 year pedigree takes some time to move. I was heartened to see that it was the first SIG to get a subject on the official conference friendfeed. Web 2.0 is upon us. With luck, this will become a bigger part of the meeting. If so, this will be other peoples achievement, not mine. Did I mention that this is my last year?

I’m looking forward to giving my paper on functions and roles in ontologies. One of the more minor reasons for retiring, is that it’s easier to publish in a workshop which you are not organising. I’m surprisingly nervous about the talk; probably as much so as in Edmonton. I’ve been practicing the talk incessantly, to the point that my back is complaining from too much sitting. It’s my first ever single author paper. I’m hoping that people will like the paper; it’s message is simple and straight-forward. Of course, this doesn’t mean that it’s correct. Last years paper on a similar topic caused quite a fuss (which, let’s be honest, was partly my fault) and I know that some in the audience will be quite vehement in their opposition to mine. Even though I’ve been over it so many times, I have the back-of-my-mind fear that there is a big hole that I’ve missed.

I guess this is good; it means that I’m excited about my own paper in a way that I haven’t been for years. A bit of fuss will mean that other people are too, for good or for ill. In the end, I’ll probably be most disappointed if the paper goes with a whimper not a bang.

Ah, it does on and on. After my last attempt at literate OWL programming, called omnsplit, I decided that there was a problem; this version splits the OWL file into individual statements, and puts them into files with the same name as the OWL class (property, or whatever).

The problem is that, for an ontology like OBI, you get 1400 individual files; this is just inconvienient as many applications don’t like this many files in a directory. Also, there is a naming constraint; you can only use characters legal in the file system; this doesn’t include “:” if you want to be Windows (NTFS) compliant.

So, for my new system, I decided to generate an index file, which just points at locations in the ontology file. Initially, I was just going to index the main ontology file; in the end, I decided a partial copy was the way forward; generating both the index and indexed file ensure that they will stay in-sync.

It required a bit of nasty latex hacking; the basic problem was avoiding the limitation of being only able to use legal LaTeX macro characters (that is letters). The system now works like this:



%% This is generated by python which also generates the
%% function_ont.spt file which is a copy of the ontology (with a
%% few new lines gone.

%% This just defines a new macro in what appears to be an
%% unnecessarily complex way.
\expandafter\def\csname OmnEntityHeaderheader\endcsname%
{\lstinputlisting[language=omn,firstline=1,lastline=8]{function_ont.spt}}

%% But the use of \expandafter and \csname means that you can
%% use any character you like, including underscores and numbers
%% in the macro name.
\expandafter\def\csname OmnEntityObjectPropertyhas_role\endcsname%
{\lstinputlisting[language=omn,firstline=206,lastline=219]{function_ont.spt}}

%% We can now define two commands in the style file. Again
%% we use \csname so that we are not bound to characters legal
%% in latex macros.
\newcommand{\omnclass}[2]{\csname OmnEntityClass#1#2\endcsname}
\newcommand{\omnobjprop}[2]{\csname OmnEntityObjectProperty#1#2\endcsname}

%% now in our source, we can do things like this.
\omnobjprop{}{has_role}

Using an index in this way also has another advantage. I’ve had to make a decision whether to go with rdfs:label or the entity name. I can now back out of this; I can just use both in the index file, without too much extra space, so that either would be referencable within the latex.

To me, this feels like the right solution. It’s relatively simple (with a bit of nasty latex, which is nicely hidden), it doesn’t depend on the file system. It needs a bit more work to bring it to completion, but not that much.

Sadly bio-ontologies looms, so next week will be getting ready for that; perhaps I can finish this off on the way back. “Sadly” is perhaps a poor choice of words; I’m greatly looking forward to it, but I’ve kind of had the bit between my teeth with python and latex hacking for the last few weeks.

After a bit of struggle, I now have another literate OWL tool working, along the lines discussed in a previous blog post. Rather than generating the OWL documentation, I now split a Manchester syntax file up, so that I can refer to bits of it. I have this working with OBI, using Protege to produce a single merged ontology file, in Manchester syntax.

The current implementation is rather simple; it produces one file-per-entity in the OWL file which I don’t think is entirely good. When run on OBI, it creates over 1400 files which is a lot. The other problem is that I’ve had to do some dubious hacking to get the file names work out. Firstly, I have to remove spaces and “\”’s, as wel as “:” which is illegal on NTFS.

There’s also a problem with some of the OWL. Unfortunately, the OBI to OWL conversion process has a reification step which I don’t quite understand the purpose of. This comes out as this sort of anonymous individual. I’m not sure at all how the definition has come out as the rdfs:label, but, for sure, you can’t use this as a filename!


Individual: relationship:genid7

    Annotations:
        rdfs:label "C located_in C' if and only if: given any c that
instantiates C at a time t, there is some c' such that: c' instantiates
C' at time t and c *located_in* c'. (Here *located_in* is the
instance-level location relation.)"@en,
        oboInOwl:hasDbXref relationship:genid8

    Types:
        oboInOwl:Definition

I think I might change the implementation a bit, though. Having 1400 files in one directory is not good. My idea is to serialize the entire file out as latex, with lots of macros, autogenerated.


%% this would appear in the generated file
\newcommand{\OwlClassowlthing}{
  \begin{omn}
Class: owl:Thing
  \end{omn}
}

%% then in your latex file you would do
\owlclass{owl}{Thing}

%% which would just resolve to the class above

The only worry with this is that latex would then have to read a large file into latex, even if most of the macros are not used. This might be really, really slow. Well, we can but try.

As before, the current version is available at git://github.com/phillord/literate_omn.git.

Well, after a reasonable degree of struggle, I managed to get the first version of my literate OWL system working. As well as learning python, I’ve had a go with git; my repo is hosted on github at git://github.com/phillord/literate_omn.git. There are three components.

omnextract.py this pulls out all the referenced omn files from the TeX document and produces the complete omn file.
omn.sty this is a driver for the listings package which does syntax highlighting in TeX.
omndoc.sty this provides commands for including files into the TeX. It’s a thin wrapper around the listings package.

I decided to make omn.sty seperate from omndoc.sty as it works standalone, if you just want to use the listings package on its own. At the moment, you can only include files; environments don’t work. You can see the the pdf it creates from this TeX


\documentclass{article}

\usepackage[pdftex]{color}
\usepackage{omndoc}

\title{A Test Document for OMNDoc}
\author{Phillip Lord}
%% should be ignored by latex, put read by python

\omndoc{all_test.omn}

\begin{document}
\maketitle

Here is a piece of OWL that should be readable in the documentation and in the
OMN output.

\begin{omn}
Class: FirstClass
\end{omn}

\omn{first.pomn}

Here is a piece of OWL that should be readable in the OMN output but is to
boring to be worth of consideration for the documentation.

% \ignore{
%   \begin{omn}
%     Class: BoringOWL
%   \end{omn}
% }

\ignore{\omn{second.pomn}}

Here is a piece of broken OWL that should be rendered in the documentation (as
broken!) but should be ignored in the OMN.

% \begin{notomn}
% Clazz: BrokenOmn
% \end{notomn}

\notomn{third.pomn}

\end{document}

I’m starting to debate with myself, though, whether I have gone the right route here. The problem is that splitting the omn file up into bits is a pain. It only supports one way of working; if you want to use Protege, for example, to edit the file, you can’t; you can only view. We even miss the big advantage of literate programming; one source for both document and computation. But, then, you are stuck with a poor editing environment for either the documentation or computational representation.

I’ve been thinking instead of a system which would like this:


\omndoc{function.omn}

\omnClass{Function}

\omnProperty{has_role}

\omnSummary{}
\omnMissing{}

Now, the python component would split the function.omn file instead of combining it. Each class, individual or property would be but into it’s own file. The \omnClass macro would then just be a simple include (again using the listings package; it would show the class inline. \omnSummary would include some TeX (generated from python) saying how many classes and so forth were in the omn file; \omnMissing would produce a list of Classes that are not explicitly included. Given a big monitor, you could work on the two sources (documentation and ontology) side-by-side, with only a little bit of editing to support jump-to or equivalent. Finally, it would be more syntax-independent. The TeX would not need to be changed to support, for example, the XML syntax. Just some python to split the XML document up into snippets.

I shall start coding this over the next couple of days. I think I already have most of the python that I need so, hopefully, it should not take too long.

My next blog post was going to be about function, as I have just had a paper about it accepted. But, I got slightly side-tracked along the way, thinking about Literate Programming as it applies to OWL. While an ontology is (or, to my mind, should be) a computational artifact, it’s a bit different from a program; the main thing is that it doesn’t run; it doesn’t have that functional test that a program does. This is not to say that an ontology is not an application-dependent entity. It can be, but even then it needs to have a program built on it.

One of the upshots of this is that a narrative justification for an Ontology is fairly important; currently, we spend far too long on mailing lists, arguing about ontology terms and, to my mind, not enough of this is reflected in the final outcome. If, on the other hand, we moved to a situation that adding a new concept was equivalent to writing a paper, we might have less of this. Discussion would be a bit more focussed; besides which, most scientists are experienced with writing and reviewing papers, so we’d just be better at it.

For this to happen productively, though, the paper has to become, itself, a computational artifact. It’s not good having documentation that has to be kept in-sync with the ontology; we will just end up with multiple versions, and will never quite know what we are talking about; my discussions about BFO have shown me this; do we mean the OWL, the definitions in the OWL, the papers or what? We should be able to generate both readable documentation and computational OWL at the same time. In short, literate programming.

Now, I know that Bijan Parsia has been investigating this also, but I wanted to think a little bit about how it would fit into my environment.

One thought was to get the system working within asciidoc which I am using to generate these pages. This turned out to be simple enough; take, for instance, this definition for BiologicalFunction.


Class: BiologicalFunction
    Annotations:
    rdfs:comment
"Definition: A biological function is a realizable entity that inheres in continuant
which is realized in an activity, and where the homologous structure(s) of
individuals of closely related species (or identical species) fulfil this
same biological function.",

    SubClassOf:
        Function

Asciidoc uses source-highlight for it’s syntax highlighting. I had to add a bit of config (which, annoyingly, needs to be placed into main install directory for source-highlight, rather than in a user space dot-directory.

Unfortunately, this is not going to be as good as you might hope for printed documentation. The obvious solution here is to aim at LaTeX. I think that I am going to have a quick go at producing something like this, inspired by Literate Haskell. Basically, I need three tags which look like this:



\begin{owl}
Class: Thing
\end{owl}

\ignore{
\begin{owl}
Class: BoringOWL
\end{owl}
}

\begin{notowl}
Clazz: BrokenOwl
\end{notowl}

The first copes with OWL that should appear both in the documentation and code (that is most of it). The second covers OWL that should appear just in the code; the haskell example is for a “help” function; I suspect that this is rarely needed for OWL. The final example appears just in documentation; it would be useful for anti-examples (“Don’t do this!!!”). My plan would be to pre-process the latex just using regexps, nothing complex, to dump the OWL to a file, mostly because I don’t know how to get latex to do it. Meanwhile, these two macros would be just be defined in terms of the Listings package (which means writing yet another syntax highlighting set of regexps, oh dear).

Well, this is okay, but has two problems: first, it means writing OWL inside latex which means that editor support is going to be rubbish; second, what if I want to blog AND print a document. My solution to this is to move my ontologies to being multi-file based. As far as I can tell, Manchester OWL is order independent (except for the header). So the plan would be to write multiple files, each with a few Concepts in:

 function/header.omn
 function/function.omn
 function/biological_function.omn
 function/artifactual_function.omn

Generating a complete Manchester syntax file from this would be easy (more or less, just run cat). This could be supported within latex by adding some include macros. Again, this is trivial to do with listings package.


\owl{function.omn}
\ignore{\owl{help.omn}}
\noowl{broken.omn}

Likewise, asciidoc supports it using include macros. I shall give this a go next week. I shall produce a document describing the axiomatisation for function in OWL that started all of this off.

PS Just finished this, and found out that blogpost stripped off all my nice syntax highlighting. Took a bit of effort but (hopefully) it should all be back in again now.