Archive for the ‘Ontology’ Category


Introduction

A few weeks ago I unsubscribed from the BFO discuss mailing list. I’ve been reading and posting there since March 2007; in that time I’ve managed to send 492 mail messages which surprises even me. As a mailing list, BFO discuss is a slightly bruising experience: it’s a bit like a bar fight; one person swings a punch and everyone just piles in. I joined the mailing list because BFO has become somewhat of a force within the bio-ontology community and I wanted to help make sure it was fit for purpose; however, I have to admit that I have been as guilty of reaching for nearest available pool cue as the next ontologist. Not the best side of me, but there you have it.

During my time on the mailing list, I have learnt a lot about BFO and the realist philosophy that, in theory, underpins it. Actually, BFO is not at all bad; for me, though, realism is largely without merit. One of the main difficulties with realism is that is carries with it the idea that, by thinking very hard, you can come up with a “representation of reality”. I think that this is mistaken. As scientists, we should be wary of thinking too much; our role, whenever possible, is to think just enough to get us to the start of the next experiment. This doesn’t seem to happen with BFO; in the time that I have been on the mailing list, BFO itself has changed very little; the constant feedback and iteration to accommodate new knowledge and experience is largely not happening. I have qualms with many parts of BFO (for example, I have discussed the issues with the Realizable Entity hierarchy). However, for me, the worse outcome of the philosophical approach have happened as a result of not considering the advanced models that physics has produced to explain the experimental data that we see. I give four examples.


Length in Space

BFO makes a very high-level split between Independent and Dependent Continuants. A continuant is something that persists over time, but which exists in full for this entire time: my computer or me, for instance, as opposed to a process, not all of which exists at any point in time. The distinction between an independent and dependent continuant depends on whether this entity exists on its own; for my height, a dependent continuant, to exist, I also have to exist. Once I cease to exist, so does my height. This seems okay, but in tying physical dimensions to an independent continuant, BFO has made a fundamental error: how do we express the length of a Spatial Region? Length is a dependent continuant and, so, there must be independent continuant in which is inheres. Unfortunately, Spatial Region is not an independent continuant itself.

There are solutions, of course; we can think of another relation, other than inheres to link Spatial Region and Length. But, we still need a Independent Continuant to exist that this length inheres in. Another possibility is to describe the length of a spatial region as the length of a Independent Continuant that could exists in it. But, it is easy to think of Spatial Regions in which no Independent Continuant can exist (for example, the Spatial Region 1m longer than the longest object in the universe). BFO would be modelling the world backward; physics uses a coordinate system and places objects within that; this approach would use objects to define the coordinate system.

Currently, this problem seems to have been accepted by some of the authors of BFO; however, there is no solution. If BFO had started from the mathematical models of physics, to me it seems likely that we would not be in this position.


Change in Process

BFO suggests that Occurrents (such as a process) can have properties in a similar way that independent continuants can have qualities. I have a length, a process may have a duration. However, BFO suggests that the properties of a an Occurrent cannot change; rather, there must be a new Occurrent.

Again, this makes little sense, and ignores very simple physical examples. Consider, for example, a car first travelling at 10ms-1, then 20ms-1. Consider the process of motion. BFO would have us model this as 3 processes; car moving at 10ms-1, car moving at 20ms-1 and a single motion process of which the other two are part.

For a simple example, this style of modelling may work. However, consider the earth travelling around the sun. The problem is that the motion is continually changing; the earth’s velocity changes infinitesimally toward the sun, so it’s always accelerating. Worse, the acceleration also changes infinitesimally, as the earth’s relative location to sun changes. So, to model this in BFO, we need an infinite number of processes (for both the motion and acceleration). We could argue that while the velocity and acceleration change constantly, the angular velocity and speed of the earth is constant, so why not model the process in these terms? Unfortunately, even this is not true; the earth moves in an ellipse, not a circle, even if its very close to a circle. So, the angular velocity and speed change continually also.

The physics of this is, as I have said, straightforward. The earth’s motion has a velocity and acceleration expressed as (nearly) two sine waves along the two axes.


Rate of Change

In order to get to the subtleties in a clearer fashion, we remind you of a joke which you surely must have heard. At the point where a lady in a car is caught by a cop, the cop comes up to her and says, “Lady, you were going 60 miles an hour!” She says, “That’s impossible, sir, I was travelling only seven minutes. It is ridiculous – how can I go 60 miles an hour when I wasn’t going an hour?”

— Richard Feynman

In a short, recent thread, it appears that there has been discussion on those qualities that need a period of time to have meaning. The examples given include velocity and acceleration. But does this make any sense? It is certainly the case, as the Feynman quote shows, that the definition of velocity is not obvious. But it’s also a known issue. Feynman’s story shows that it can be very hard to describe exactly what you mean when talking about velocity; it’s for this reason that physics uses mathematical notation, where we can be precise. Velocity is \(dr/dt\), acceleration is \(d^{2}r/dt^{2}\). As I have said, these examples do not stand alone — the same applies to many other qualities, including those where change is not over time.

In short, it makes little sense to create distinctions in our physical model of the world that physics does not make. We are creating work for ourselves and confusion for everyone else.


Absolute Space

BFO distinguishes between Sites and SpatialRegions; the idea is to distinguish between bits of space in general, and holes — the lumen of the gut, for instance. This seems reasonable at first sight. However, this is being done by suggesting that a Site is relative to an IndependentContinuant while SpatialRegions are absolute.

In short, over 100 years after Michelson-Morley, BFO has reinvented absolute space. The justification for this is that, according to one of the authors, without absolute space, problems arise. The problems haven’t been described in detail, but apparently, involve things moving through space or changing shape.

BFO is put forward as a “realist” ontology — that is it models the key entities as they exist in reality. And, the reality is this; there is no evidence that absolute space exists and, indeed, very strong evidence that it does not. It is also hard to see how this could cause problems; Einstein removed absolute space from the model that physics uses a century ago. Now, admittedly, this produces some really weird and counter-intuitive results, but only when two objects are moving rapidly with respect to each other. Relativity does not cause any problems that are not necessary to describe the world. In practice for “everyday” physics, the upshot is that you just define (or assume) a frame of reference; there is normally an obvious one, but any frame will do, and the results will come out the same.

My post on this produced some interesting replies. Bjoern Peters straightforwardly agreed. Alan Ruttenberg suggested that I was arguing space doesn’t exist; while Barry Smith argued that having this (false!) distinction in BFO is necessary for practical reasons.

At which point, I unsubscribed.


Conclusions

I am not arguing here that BFO is totally broken or has no purpose. To some extent, I am yet to be convinced that having any upper ontology helps with ontology building: arguing against, they are hard to understand and often result in a top-down design which ends in philosophical arguments and analysis paralysis; arguing for, they provide some basic structure or a design pattern, which can ease the task of starting to build an ontology, or to understand someone else’s. I am unsure yet whether they help with (computational) interoperability; by analogy to software, design patterns are good for the developer but do not provide any more guarantees. In general, though, I work on the basis that the use of a common framework seems a sensible idea; it is something we should try until we have enough data to make a more coherent decision. BFO provides one such basic framework; and, in general, it’s okay so long as we do not take it too seriously. We should be willing to ignore it when it fails.

However, realism has much less going for it. It is based on the conceit that we should look at reality; now, within a scientific context, this means experimental data. The statement that science should use experimental data, though, is obvious and is a truism; it cannot, therefore, itself define a methodology.

In practice, however, BFO has been built leaning on 2000 years of philosophy; and here lies the mistake. We should acknowledge our limitations as ontologists; we have nothing at all to add to a physical model of the universe as the physicists have already done it. All we need is to represent their model; we should not be looking at experimental data, because someone else has already done it for us. The problems described here are all avoided by the simple mathematical model that physics uses — 4 dimensions, or real number lines, at 90 degrees to each other, and by the use of calculus to describe change.

In BFO, we see an attempt to consider the key entities as they exist in reality; and, the bottom line here, is that at least for these few classes, BFO has done a bad job of it. It has misunderstood lengths and space, developed a process model that is unmanageable and made distinctions that are known to be wrong. Biology is built on top of the other sciences, and it will not benefit the cause of bio-ontologies if we ignore them. Worse biologists attempting to use BFO will find it hard to apply models which are demonstrably wrong; what criteria can we apply to distinguish SpatialRegions and Sites, when physics tells us that these criteria do not and cannot exist? Finally, as ontologists, we should accept our limitations and the limitations of the technology; we should not attempt to re-represent knowledge which has already been modelled in more appropriate ways.

We should be experimenting and testing more than we are thinking; we should be embracing change when we are wrong. We should be leaning on 200 years of physics and biology, not 2000 years of philosophy.

So, this year of Bio-Ontologies is upon me; I’m sitting in the airport waiting to fly in the wrong direction; although I’ve noticed that the airport signs no longer call this “waiting time” but “shopping time”.

It’s 12 years on now; I can’t remember whether this makes it the oldest SIG at ISMB, but it must be close. Perhaps it is surprising that a small meeting like this has lasted so long, but during it’s time the use of ontologies within biology has blossomed; to some extent, this is true of the outside world also. This year has carried on with the trend. Gone are the days that we used to get enough papers to fill the day, but no more; we’ve stretched the day out, we’ve added a poster session but still we get more. The number of attendees has gone up somewhat also. It’s good to see.

For me, bio-ontologies has also been the centre of my entry into the field; Edmonton was the first ontology paper that I ever presented — perhaps depressingly, still some of my best work. This year has special significance for me. I’m giving a paper myself for the first time since Edmonton; perhaps fitting to end off as I began, because this will also be my last year as conference chair. I’ve been involved now for 6 of the 12 years; while, I’ve enjoyed it and felt privileged to do the work, it’s enough. Organising is hard work, even now when I understand the process well. In the last few years, I’ve tried to push the workshop to be a bit broader than just ontologies, to take in all new forms and technologies for representing and distributing knowledge; I’ve met with some, but limited success. A workshop with a 12 year pedigree takes some time to move. I was heartened to see that it was the first SIG to get a subject on the official conference friendfeed. Web 2.0 is upon us. With luck, this will become a bigger part of the meeting. If so, this will be other peoples achievement, not mine. Did I mention that this is my last year?

I’m looking forward to giving my paper on functions and roles in ontologies. One of the more minor reasons for retiring, is that it’s easier to publish in a workshop which you are not organising. I’m surprisingly nervous about the talk; probably as much so as in Edmonton. I’ve been practicing the talk incessantly, to the point that my back is complaining from too much sitting. It’s my first ever single author paper. I’m hoping that people will like the paper; it’s message is simple and straight-forward. Of course, this doesn’t mean that it’s correct. Last years paper on a similar topic caused quite a fuss (which, let’s be honest, was partly my fault) and I know that some in the audience will be quite vehement in their opposition to mine. Even though I’ve been over it so many times, I have the back-of-my-mind fear that there is a big hole that I’ve missed.

I guess this is good; it means that I’m excited about my own paper in a way that I haven’t been for years. A bit of fuss will mean that other people are too, for good or for ill. In the end, I’ll probably be most disappointed if the paper goes with a whimper not a bang.

Ah, it does on and on. After my last attempt at literate OWL programming, called omnsplit, I decided that there was a problem; this version splits the OWL file into individual statements, and puts them into files with the same name as the OWL class (property, or whatever).

The problem is that, for an ontology like OBI, you get 1400 individual files; this is just inconvienient as many applications don’t like this many files in a directory. Also, there is a naming constraint; you can only use characters legal in the file system; this doesn’t include “:” if you want to be Windows (NTFS) compliant.

So, for my new system, I decided to generate an index file, which just points at locations in the ontology file. Initially, I was just going to index the main ontology file; in the end, I decided a partial copy was the way forward; generating both the index and indexed file ensure that they will stay in-sync.

It required a bit of nasty latex hacking; the basic problem was avoiding the limitation of being only able to use legal LaTeX macro characters (that is letters). The system now works like this:



%% This is generated by python which also generates the
%% function_ont.spt file which is a copy of the ontology (with a
%% few new lines gone.

%% This just defines a new macro in what appears to be an
%% unnecessarily complex way.
\expandafter\def\csname OmnEntityHeaderheader\endcsname%
{\lstinputlisting[language=omn,firstline=1,lastline=8]{function_ont.spt}}

%% But the use of \expandafter and \csname means that you can
%% use any character you like, including underscores and numbers
%% in the macro name.
\expandafter\def\csname OmnEntityObjectPropertyhas_role\endcsname%
{\lstinputlisting[language=omn,firstline=206,lastline=219]{function_ont.spt}}

%% We can now define two commands in the style file. Again
%% we use \csname so that we are not bound to characters legal
%% in latex macros.
\newcommand{\omnclass}[2]{\csname OmnEntityClass#1#2\endcsname}
\newcommand{\omnobjprop}[2]{\csname OmnEntityObjectProperty#1#2\endcsname}

%% now in our source, we can do things like this.
\omnobjprop{}{has_role}

Using an index in this way also has another advantage. I’ve had to make a decision whether to go with rdfs:label or the entity name. I can now back out of this; I can just use both in the index file, without too much extra space, so that either would be referencable within the latex.

To me, this feels like the right solution. It’s relatively simple (with a bit of nasty latex, which is nicely hidden), it doesn’t depend on the file system. It needs a bit more work to bring it to completion, but not that much.

Sadly bio-ontologies looms, so next week will be getting ready for that; perhaps I can finish this off on the way back. “Sadly” is perhaps a poor choice of words; I’m greatly looking forward to it, but I’ve kind of had the bit between my teeth with python and latex hacking for the last few weeks.

After a bit of struggle, I now have another literate OWL tool working, along the lines discussed in a previous blog post. Rather than generating the OWL documentation, I now split a Manchester syntax file up, so that I can refer to bits of it. I have this working with OBI, using Protege to produce a single merged ontology file, in Manchester syntax.

The current implementation is rather simple; it produces one file-per-entity in the OWL file which I don’t think is entirely good. When run on OBI, it creates over 1400 files which is a lot. The other problem is that I’ve had to do some dubious hacking to get the file names work out. Firstly, I have to remove spaces and “\”‘s, as wel as “:” which is illegal on NTFS.

There’s also a problem with some of the OWL. Unfortunately, the OBI to OWL conversion process has a reification step which I don’t quite understand the purpose of. This comes out as this sort of anonymous individual. I’m not sure at all how the definition has come out as the rdfs:label, but, for sure, you can’t use this as a filename!


Individual: relationship:genid7

    Annotations:
        rdfs:label "C located_in C' if and only if: given any c that
instantiates C at a time t, there is some c' such that: c' instantiates
C' at time t and c *located_in* c'. (Here *located_in* is the
instance-level location relation.)"@en,
        oboInOwl:hasDbXref relationship:genid8

    Types:
        oboInOwl:Definition

I think I might change the implementation a bit, though. Having 1400 files in one directory is not good. My idea is to serialize the entire file out as latex, with lots of macros, autogenerated.


%% this would appear in the generated file
\newcommand{\OwlClassowlthing}{
  \begin{omn}
Class: owl:Thing
  \end{omn}
}

%% then in your latex file you would do
\owlclass{owl}{Thing}

%% which would just resolve to the class above

The only worry with this is that latex would then have to read a large file into latex, even if most of the macros are not used. This might be really, really slow. Well, we can but try.

As before, the current version is available at git://github.com/phillord/literate_omn.git.

Well, after a reasonable degree of struggle, I managed to get the first version of my literate OWL system working. As well as learning python, I’ve had a go with git; my repo is hosted on github at git://github.com/phillord/literate_omn.git. There are three components.

omnextract.py this pulls out all the referenced omn files from the TeX document and produces the complete omn file.
omn.sty this is a driver for the listings package which does syntax highlighting in TeX.
omndoc.sty this provides commands for including files into the TeX. It’s a thin wrapper around the listings package.

I decided to make omn.sty seperate from omndoc.sty as it works standalone, if you just want to use the listings package on its own. At the moment, you can only include files; environments don’t work. You can see the the pdf it creates from this TeX


\documentclass{article}

\usepackage[pdftex]{color}
\usepackage{omndoc}

\title{A Test Document for OMNDoc}
\author{Phillip Lord}
%% should be ignored by latex, put read by python

\omndoc{all_test.omn}

\begin{document}
\maketitle

Here is a piece of OWL that should be readable in the documentation and in the
OMN output.

\begin{omn}
Class: FirstClass
\end{omn}

\omn{first.pomn}

Here is a piece of OWL that should be readable in the OMN output but is to
boring to be worth of consideration for the documentation.

% \ignore{
%   \begin{omn}
%     Class: BoringOWL
%   \end{omn}
% }

\ignore{\omn{second.pomn}}

Here is a piece of broken OWL that should be rendered in the documentation (as
broken!) but should be ignored in the OMN.

% \begin{notomn}
% Clazz: BrokenOmn
% \end{notomn}

\notomn{third.pomn}

\end{document}

I’m starting to debate with myself, though, whether I have gone the right route here. The problem is that splitting the omn file up into bits is a pain. It only supports one way of working; if you want to use Protege, for example, to edit the file, you can’t; you can only view. We even miss the big advantage of literate programming; one source for both document and computation. But, then, you are stuck with a poor editing environment for either the documentation or computational representation.

I’ve been thinking instead of a system which would like this:


\omndoc{function.omn}

\omnClass{Function}

\omnProperty{has_role}

\omnSummary{}
\omnMissing{}

Now, the python component would split the function.omn file instead of combining it. Each class, individual or property would be but into it’s own file. The \omnClass macro would then just be a simple include (again using the listings package; it would show the class inline. \omnSummary would include some TeX (generated from python) saying how many classes and so forth were in the omn file; \omnMissing would produce a list of Classes that are not explicitly included. Given a big monitor, you could work on the two sources (documentation and ontology) side-by-side, with only a little bit of editing to support jump-to or equivalent. Finally, it would be more syntax-independent. The TeX would not need to be changed to support, for example, the XML syntax. Just some python to split the XML document up into snippets.

I shall start coding this over the next couple of days. I think I already have most of the python that I need so, hopefully, it should not take too long.

My next blog post was going to be about function, as I have just had a paper about it accepted. But, I got slightly side-tracked along the way, thinking about Literate Programming as it applies to OWL. While an ontology is (or, to my mind, should be) a computational artifact, it’s a bit different from a program; the main thing is that it doesn’t run; it doesn’t have that functional test that a program does. This is not to say that an ontology is not an application-dependent entity. It can be, but even then it needs to have a program built on it.

One of the upshots of this is that a narrative justification for an Ontology is fairly important; currently, we spend far too long on mailing lists, arguing about ontology terms and, to my mind, not enough of this is reflected in the final outcome. If, on the other hand, we moved to a situation that adding a new concept was equivalent to writing a paper, we might have less of this. Discussion would be a bit more focussed; besides which, most scientists are experienced with writing and reviewing papers, so we’d just be better at it.

For this to happen productively, though, the paper has to become, itself, a computational artifact. It’s not good having documentation that has to be kept in-sync with the ontology; we will just end up with multiple versions, and will never quite know what we are talking about; my discussions about BFO have shown me this; do we mean the OWL, the definitions in the OWL, the papers or what? We should be able to generate both readable documentation and computational OWL at the same time. In short, literate programming.

Now, I know that Bijan Parsia has been investigating this also, but I wanted to think a little bit about how it would fit into my environment.

One thought was to get the system working within asciidoc which I am using to generate these pages. This turned out to be simple enough; take, for instance, this definition for BiologicalFunction.


Class: BiologicalFunction
    Annotations:
    rdfs:comment
"Definition: A biological function is a realizable entity that inheres in continuant
which is realized in an activity, and where the homologous structure(s) of
individuals of closely related species (or identical species) fulfil this
same biological function.",

    SubClassOf:
        Function

Asciidoc uses source-highlight for it’s syntax highlighting. I had to add a bit of config (which, annoyingly, needs to be placed into main install directory for source-highlight, rather than in a user space dot-directory.

Unfortunately, this is not going to be as good as you might hope for printed documentation. The obvious solution here is to aim at LaTeX. I think that I am going to have a quick go at producing something like this, inspired by Literate Haskell. Basically, I need three tags which look like this:



\begin{owl}
Class: Thing
\end{owl}

\ignore{
\begin{owl}
Class: BoringOWL
\end{owl}
}

\begin{notowl}
Clazz: BrokenOwl
\end{notowl}

The first copes with OWL that should appear both in the documentation and code (that is most of it). The second covers OWL that should appear just in the code; the haskell example is for a “help” function; I suspect that this is rarely needed for OWL. The final example appears just in documentation; it would be useful for anti-examples (“Don’t do this!!!”). My plan would be to pre-process the latex just using regexps, nothing complex, to dump the OWL to a file, mostly because I don’t know how to get latex to do it. Meanwhile, these two macros would be just be defined in terms of the Listings package (which means writing yet another syntax highlighting set of regexps, oh dear).

Well, this is okay, but has two problems: first, it means writing OWL inside latex which means that editor support is going to be rubbish; second, what if I want to blog AND print a document. My solution to this is to move my ontologies to being multi-file based. As far as I can tell, Manchester OWL is order independent (except for the header). So the plan would be to write multiple files, each with a few Concepts in:

 function/header.omn
 function/function.omn
 function/biological_function.omn
 function/artifactual_function.omn

Generating a complete Manchester syntax file from this would be easy (more or less, just run cat). This could be supported within latex by adding some include macros. Again, this is trivial to do with listings package.


\owl{function.omn}
\ignore{\owl{help.omn}}
\noowl{broken.omn}

Likewise, asciidoc supports it using include macros. I shall give this a go next week. I shall produce a document describing the axiomatisation for function in OWL that started all of this off.

PS Just finished this, and found out that blogpost stripped off all my nice syntax highlighting. Took a bit of effort but (hopefully) it should all be back in again now.