Archive for the ‘Professional’ Category

I’m on my way to the second Knowledge Blog meeting. Well, sort of. The first meeting was badged the “Ontogenesis Tutorial” meeting; the focus was on developing a tutorial resource for ontologies. Actually, much the same will be true of this meeting, but I’ve decided that, for this meeting, as well as addressing the reviews for my own article on Ontogenesis, I am going to want to spend some time supporting the process itself. In the first place, this means writing a couple of articles for Process: a new knowledge blog that I am starting for discussion of the process itself.

Since the first meeting, I’ve had plenty of time to reflect on the general idea of knowledgeblogging. As far as I can see, there is one overwhelming truth about the situation; we got 15 articles in 2 days and, since then, we have been averaging between 500 and 1000 page hits a month. Now, of course, it’s an open question whether this is at all sustainable; we have no advertising and no financial support. But, still, our most read article (“What is an Ontology”) has had several hundred reads and, bottom line, that is pretty good going for an academic article. We might like to think that the work that we do is important (well, it is!), but in publishing terms we are pretty much of a niche market.

On the negative side, we have had articles flooding in and none of those from the last meeting have got any further. Thinking back to Nupedia, many moons ago, it’s obvious that getting an authorship is always going to be a problem.

I’m also going to have to think of a snappier and short name for than “knowledgeblog” which is taking far too long to type. So far:

k-log
Simple, straightforward, but already used
knowblog
Good, but a homonym for “noblog” which is confusing.
knoblog
Pronounced “noh-blog” would be great, but English is not a phoentic language
knob
“KNOweledge Blog” — excellent in many ways, but I realise that the entire world does not share my slightly puerile sense of humour.

Hmmm. Comments welcome. So long as they are not about my puerile sense of humour.


Introduction

A few weeks ago I unsubscribed from the BFO discuss mailing list. I’ve been reading and posting there since March 2007; in that time I’ve managed to send 492 mail messages which surprises even me. As a mailing list, BFO discuss is a slightly bruising experience: it’s a bit like a bar fight; one person swings a punch and everyone just piles in. I joined the mailing list because BFO has become somewhat of a force within the bio-ontology community and I wanted to help make sure it was fit for purpose; however, I have to admit that I have been as guilty of reaching for nearest available pool cue as the next ontologist. Not the best side of me, but there you have it.

During my time on the mailing list, I have learnt a lot about BFO and the realist philosophy that, in theory, underpins it. Actually, BFO is not at all bad; for me, though, realism is largely without merit. One of the main difficulties with realism is that is carries with it the idea that, by thinking very hard, you can come up with a “representation of reality”. I think that this is mistaken. As scientists, we should be wary of thinking too much; our role, whenever possible, is to think just enough to get us to the start of the next experiment. This doesn’t seem to happen with BFO; in the time that I have been on the mailing list, BFO itself has changed very little; the constant feedback and iteration to accommodate new knowledge and experience is largely not happening. I have qualms with many parts of BFO (for example, I have discussed the issues with the Realizable Entity hierarchy). However, for me, the worse outcome of the philosophical approach have happened as a result of not considering the advanced models that physics has produced to explain the experimental data that we see. I give four examples.


Length in Space

BFO makes a very high-level split between Independent and Dependent Continuants. A continuant is something that persists over time, but which exists in full for this entire time: my computer or me, for instance, as opposed to a process, not all of which exists at any point in time. The distinction between an independent and dependent continuant depends on whether this entity exists on its own; for my height, a dependent continuant, to exist, I also have to exist. Once I cease to exist, so does my height. This seems okay, but in tying physical dimensions to an independent continuant, BFO has made a fundamental error: how do we express the length of a Spatial Region? Length is a dependent continuant and, so, there must be independent continuant in which is inheres. Unfortunately, Spatial Region is not an independent continuant itself.

There are solutions, of course; we can think of another relation, other than inheres to link Spatial Region and Length. But, we still need a Independent Continuant to exist that this length inheres in. Another possibility is to describe the length of a spatial region as the length of a Independent Continuant that could exists in it. But, it is easy to think of Spatial Regions in which no Independent Continuant can exist (for example, the Spatial Region 1m longer than the longest object in the universe). BFO would be modelling the world backward; physics uses a coordinate system and places objects within that; this approach would use objects to define the coordinate system.

Currently, this problem seems to have been accepted by some of the authors of BFO; however, there is no solution. If BFO had started from the mathematical models of physics, to me it seems likely that we would not be in this position.


Change in Process

BFO suggests that Occurrents (such as a process) can have properties in a similar way that independent continuants can have qualities. I have a length, a process may have a duration. However, BFO suggests that the properties of a an Occurrent cannot change; rather, there must be a new Occurrent.

Again, this makes little sense, and ignores very simple physical examples. Consider, for example, a car first travelling at 10ms-1, then 20ms-1. Consider the process of motion. BFO would have us model this as 3 processes; car moving at 10ms-1, car moving at 20ms-1 and a single motion process of which the other two are part.

For a simple example, this style of modelling may work. However, consider the earth travelling around the sun. The problem is that the motion is continually changing; the earth’s velocity changes infinitesimally toward the sun, so it’s always accelerating. Worse, the acceleration also changes infinitesimally, as the earth’s relative location to sun changes. So, to model this in BFO, we need an infinite number of processes (for both the motion and acceleration). We could argue that while the velocity and acceleration change constantly, the angular velocity and speed of the earth is constant, so why not model the process in these terms? Unfortunately, even this is not true; the earth moves in an ellipse, not a circle, even if its very close to a circle. So, the angular velocity and speed change continually also.

The physics of this is, as I have said, straightforward. The earth’s motion has a velocity and acceleration expressed as (nearly) two sine waves along the two axes.


Rate of Change

In order to get to the subtleties in a clearer fashion, we remind you of a joke which you surely must have heard. At the point where a lady in a car is caught by a cop, the cop comes up to her and says, “Lady, you were going 60 miles an hour!” She says, “That’s impossible, sir, I was travelling only seven minutes. It is ridiculous – how can I go 60 miles an hour when I wasn’t going an hour?”

— Richard Feynman

In a short, recent thread, it appears that there has been discussion on those qualities that need a period of time to have meaning. The examples given include velocity and acceleration. But does this make any sense? It is certainly the case, as the Feynman quote shows, that the definition of velocity is not obvious. But it’s also a known issue. Feynman’s story shows that it can be very hard to describe exactly what you mean when talking about velocity; it’s for this reason that physics uses mathematical notation, where we can be precise. Velocity is \(dr/dt\), acceleration is \(d^{2}r/dt^{2}\). As I have said, these examples do not stand alone — the same applies to many other qualities, including those where change is not over time.

In short, it makes little sense to create distinctions in our physical model of the world that physics does not make. We are creating work for ourselves and confusion for everyone else.


Absolute Space

BFO distinguishes between Sites and SpatialRegions; the idea is to distinguish between bits of space in general, and holes — the lumen of the gut, for instance. This seems reasonable at first sight. However, this is being done by suggesting that a Site is relative to an IndependentContinuant while SpatialRegions are absolute.

In short, over 100 years after Michelson-Morley, BFO has reinvented absolute space. The justification for this is that, according to one of the authors, without absolute space, problems arise. The problems haven’t been described in detail, but apparently, involve things moving through space or changing shape.

BFO is put forward as a “realist” ontology — that is it models the key entities as they exist in reality. And, the reality is this; there is no evidence that absolute space exists and, indeed, very strong evidence that it does not. It is also hard to see how this could cause problems; Einstein removed absolute space from the model that physics uses a century ago. Now, admittedly, this produces some really weird and counter-intuitive results, but only when two objects are moving rapidly with respect to each other. Relativity does not cause any problems that are not necessary to describe the world. In practice for “everyday” physics, the upshot is that you just define (or assume) a frame of reference; there is normally an obvious one, but any frame will do, and the results will come out the same.

My post on this produced some interesting replies. Bjoern Peters straightforwardly agreed. Alan Ruttenberg suggested that I was arguing space doesn’t exist; while Barry Smith argued that having this (false!) distinction in BFO is necessary for practical reasons.

At which point, I unsubscribed.


Conclusions

I am not arguing here that BFO is totally broken or has no purpose. To some extent, I am yet to be convinced that having any upper ontology helps with ontology building: arguing against, they are hard to understand and often result in a top-down design which ends in philosophical arguments and analysis paralysis; arguing for, they provide some basic structure or a design pattern, which can ease the task of starting to build an ontology, or to understand someone else’s. I am unsure yet whether they help with (computational) interoperability; by analogy to software, design patterns are good for the developer but do not provide any more guarantees. In general, though, I work on the basis that the use of a common framework seems a sensible idea; it is something we should try until we have enough data to make a more coherent decision. BFO provides one such basic framework; and, in general, it’s okay so long as we do not take it too seriously. We should be willing to ignore it when it fails.

However, realism has much less going for it. It is based on the conceit that we should look at reality; now, within a scientific context, this means experimental data. The statement that science should use experimental data, though, is obvious and is a truism; it cannot, therefore, itself define a methodology.

In practice, however, BFO has been built leaning on 2000 years of philosophy; and here lies the mistake. We should acknowledge our limitations as ontologists; we have nothing at all to add to a physical model of the universe as the physicists have already done it. All we need is to represent their model; we should not be looking at experimental data, because someone else has already done it for us. The problems described here are all avoided by the simple mathematical model that physics uses — 4 dimensions, or real number lines, at 90 degrees to each other, and by the use of calculus to describe change.

In BFO, we see an attempt to consider the key entities as they exist in reality; and, the bottom line here, is that at least for these few classes, BFO has done a bad job of it. It has misunderstood lengths and space, developed a process model that is unmanageable and made distinctions that are known to be wrong. Biology is built on top of the other sciences, and it will not benefit the cause of bio-ontologies if we ignore them. Worse biologists attempting to use BFO will find it hard to apply models which are demonstrably wrong; what criteria can we apply to distinguish SpatialRegions and Sites, when physics tells us that these criteria do not and cannot exist? Finally, as ontologists, we should accept our limitations and the limitations of the technology; we should not attempt to re-represent knowledge which has already been modelled in more appropriate ways.

We should be experimenting and testing more than we are thinking; we should be embracing change when we are wrong. We should be leaning on 200 years of physics and biology, not 2000 years of philosophy.

The Ontogenesis knowledgeblog meeting has now finished; it’s been a fascinating experience and one that I’ve enjoyed very much.

I was hoping for two things out of the meeting; the first was to get some content. There has been a pressing need introductory material on ontologies for a long time now. We were never going to address this completely in a two day meeting even with the significant number of people that we had in the room. But, we managed to write quite a number of articles between us — I rather let the side-down with only one small article, but I have the excuse that I was busy answering questions. Most of these have not achieved the required number of reviews yet, although I’ve just done the second reviews for Mikel’s, so once that’s posted, we should be there for at least one article. I think that people enjoyed the process enough that some more articles will appear over time, although, inevitably, once the immediacy of being in the same room will mean that this process will not happen as rapidly.

The second question was to get a clear understanding of whether the idea of knowledgeblogging has legs; it seems reasonable in theory, but does it work in practice. There were some issues — the server crashing twice out of memory was not ideal, although quickly resolved. Quite a number of people who hadn’t blogged before found the wordpress interface, particularly the editor, fairly nasty; it’s not really designed for large posts. The review process also was a little clunky and there were many questions and ideas about this. However, for my money, the 80/20 rules comes in; we got 80 percent there with a more-or-less modified wordpress. Well, maybe, 70/30.

The rest is going to require more thinking about.

I’m on my way down to Manchester for the Ontogenesis meeting while I was sad enough to blog about on Christmas Day. I’m looking forward to this meeting a lot; the idea has been in gestation for five or six months since Bio-Ontologies last year. In summary, we are getting a number of people together to write articles for a book, but instead of going through the tedious and difficult process of getting it published we are going to use a blog.

I finished fiddling with wordpress yesterday and, hopefully, all is ready (fingers crossed that our server doesn’t get hacked as happened to this blog a few days ago). I’m hoping that we manage to get a number of articles written during the meeting; in practice, getting people in one room is the best way of getting these things done. However, this is not a closed process; I’d welcome articles from anyone, as well as those not at the meeting. Being blog based, the system is inherently distributed. So, if you have an ontology-related topic that you have a burning desire to write about, please contact me and I’ll let you know whether if anyone else is doing it. Alternatively, there is a list of topics that we hope to make a start in covering. The articles will be peer-reviewed and available for the world to see, fully-credited to your name.

I can’t guarantee that it’s going to be included in the REF, but I am working on it.

Okay, so I am totally sad and writing a blog post on Christmas day. Well, the thing is that I’ve been teaching for months and moving house. This is the first still period that I’ve had for ages; well, thinking is inevitable.

One of the things that I am looking to next year is the last ontogenesis meeting. It’s been a lot of fun doing these, I’ve enjoyed them all. The last one is my idea, and I think it’s going to be good. As an ontologist, you get a lot of questions about how to build ontologies and is there a book. At the moment, there isn’t really one and it’s a problem. So, for ontogenesis, we decided to write a set of book chapters; here is the clever bit — we just stick them on a blog, because the process of formal publication as a book is long-winded, tiresome and error-prone. I’m calling the process knowledge blogging — it’s peer-reviewed, formal and with no intention of being regular; articles come when they are written.

I set up the blog sometime ago. I haven’t, as yet, had a lot of time to fiddle with theme or organisation. There is some content, but it’s just the wordpress default theme. Not ideal, and I hope I will have some time for fixing things after I get back from holidays. I’ve noticed two problems already though. First is that with longer articles you need section headings and wordpress doesn’t do them; I’ve found a solution for this, in the shape of a contents table plugin, although subsequent googling also came up with others. This should make navigation a bit better.

The other issue is references — I don’t have a good idea about how to do these sanely. I’ve been looking for DOI wordpress plugins, but can only find one from crossref which doesn’t do what I want. This allows you to search for citations; what I wanted was to put a DOI in code and have it present properly.

Still I think I know how to do this; I’ve found a tool for linking references to the Mormon books; not normally something I would download, but the principle is the same. So I can replace DOIs with a proper link, using a DOI resolver. What I’d really like to do is have a proper in-text citation also. The documentation on DOIs and metadata harvesting is all rather nasty though; a nice simple REST API would do the trick.

It all confirms my long-held concerns about DOIs; there are a tool for the publishers. Still, perhaps pubmed will come to my rescue. Next place to look.

Happy Christmas to all my subscribers of whom there are very few.

Fours days of ontology bashing at an OBI meeting; this leaves me extremely glad to be going home. The meeting was long, hard and tiring. We got a lot done in the time available, though, and that was impressive. All the people in the room knew what they were doing, and we managed to work together and in parallel to an impressive extent. Even while listening to the main conversation, most people we also skype chatting about something else to those in and outside the room.

I spend a considerable time working on the paper, which will accompany the release. I got this job, mostly as to regularise and clean up the English, but in the end did rather more than this; I hope people are not upset about the stuff that I took out; the whole thing was done “pair programming style”, although I had different pairs for different sections.

Despite all the efforts, though, there are still tracker items open for the 1.0 release, and thats not ideal, but it is good that we are much closer to it.

Philly was much as I remember it; it’s a reasonably pleasant city. It doesn’t feel too aggressive and it’s relatively quiet. As I had a late flight out today (meeting finished yesterday), I spent the time wandering around town; like too many US cities Philly has been built to be easy to drive through, rather than good to live in, but you Philly is okay for walking around. They have a nice parkway area on JFK boulevard; I had a nice guided tour around the Rodin museum, which was wonderful, even if lacking The Thinker which is normally their show piece entranceway sculpture. Rodin was big on hands, it turns out, and rather fond of the musculature of backs; the captions on the bronzes suggest that he was having affairs with many of his models, so I wonder if this stems from…well, you can work it out.

After that, I wandered up to the art museum past the twee statue of Rocky Balboa, and the converse footprints sculpted in the stairs. The art museum itself is huge; the Thinker is temporarily here, so I got to see it after all, but I think it needs to be outside. As well as the traditional galleries, and strangely, they also have a lot of furniture there, and have imported whole rooms from various places. For me, the Asian section was the best; they had an Indian temple, dark and brooding in the half-light, and a Chinese room with the most amazing timbers. I felt the indoor Romanesque outside courtyard (erm…) was taking it a bit too far.

Not much left to be done after an afternoon full of culture; on the way back to the hotel, I looked for a little park on Chestnut that I had wanted to see and a falafel shop which I had seen sign posted. I found neither; the park had left no traces at all, the falafel shop I found a poster for, but I walked all the street and as far as I can tell 1740 Sansome is a multistory parking lot.

Back where I started, sitting in the airport; tick, tock, tick, tock.

Bleary eyed, stacks of chocolate muffins obscuring the “healthy snacking” sign, kids on heelers. Yes, I’m in the airport at stupid-o-clock on saturday morning. I’m heading out to Philadelphia for an OBI meeting. It’s an important meeting; OBI has been a long time in gestation, but this should constitute the 1.0 release; it’s going to be a mass tidy up session.

I’m quite looking forward to it, in some ways. I quite like Philadelphia, at least if my memory serves me well; I’ve only been there once, for the SOFG conference many, many moons ago, certainly in my pre-blog days. I remember it as a pleasant town, with a water-front, only slightly scarred by the enormous roads that make US cities less livable than European. I’m also hoping to catch up with Robin McEntire, who was one of the co-chairs of Bio-Ontologies at ISMB, and is local.

I’m rather unprepared for the meeting. There has been a lot of activity on the mailing list recently, some of it concerned with paper preparation. But I’ve been trying to get the rest of my teaching preparation finished (nearly done now) which has left me very busy over the last few weeks; I haven’t even had time to look at the paper; I’ve hardly read even the mailing list subject lines. Still, the next week is entirely given over to OBI, which will have to be enough. Travel at this time of year messes with my life to an extent that I’m certainly not going to feel guilty about it.

The flip side of being busy, is that I am now in the process of writing about 5 papers, with the next 2 in my head. After the confusion of moving to Newcastle, working out what research to do and learning to teach, my research was getting a bit stuck; I was running out of ideas for the simple reason of not having time to think. Having an enormous backlog of nearly finished, half-finished, and hardly started good ideas (most of which will, in time, turn out not to be) for papers makes me feel like a proper academic again.

I was most entertained to read about EPSRCs funding policy changes. Basically, they have taken a long hard look at their system for funding, they have decided that the peer-review system has fundamental problems, and have therefore issued their well thought out and considered solution to the problem: blame the users.

Their idea is this; if you are on too many grants that fail, then you won’t be allowed to submit again until you have been on some sort of re-education camp. The basic criteria appear to be this: three or more unfunded proposals, ranked in the bottom half, and lower than 25% success over the same two years.

The first criteria is problematic because it is based on an aggregate score; it is impossible to judge in advance whether you are going to be in bottom half; your proposal could be brilliant and internationally outstanding (EPSRC is like Lake Wobegon, all the grants are above average) and you could still be in the bottom half. The second half of the criterion is also interesting; if you submit a single proposal and it gets rejected then you are fall into this category straight away. It’s also going to mean that it’s going to be harder to get people to do collaborative grants, as it might bring their stats down. This is after EPSRC have been pushing us for years to put at least 5 different institutions on each proposal if we want it to be funded.

At the same time, information about the REF which is to follow up from the wonderous RAE is starting to trickle out. Nice to see that they are still going to reinforce the existing closed publication system with more bibliometric data. The “You are the REF” website offers itself as a way to work out your score. Excitingly the first question is “What is your discipline?”; Computer Scientist or Biologist. This seems reflective of the REF documentation that I have seen already. It works on this basis: different disciplines have different rules, so we will make different decisions in each, which is fine, because no one can be in two anyway.

Glad to see that the REF is carrying on the RAE tradition of encouraging multi-disciplinary research.

At Neuroinformatics 2009, David Sutherland and I talked about the problems of ontology building. One of the current (and past!) difficulties is to choose an appropriate language for representing the knowledge in your ontology. I thought I would write my thoughts up as a post; this will probably result in the most boring thing I have ever written (I am sure someone will point out worse offenses); syntax is dull but distressingly important.

In bioinformatics, there are essentially two choices that is OWL and OBO (format). A second issue, is finding a good environment for developing the ontology; this divides between Protege, OBO-Edit and the ever-present “text editor”. It’s often the case, that we want to use both of these at the same time. Take, for example, OBI, which I am involved in. While the ontology itself is being developed in OWL, many of its dependent ontologies are built using OBO; being purist and demanding one is really not an option. OWL itself has many different syntaxes; at the moment, I generally prefer Manchester sytnax because you can edit it with text-editor, which is really not so easy with any of the XML representations.

While these two languages have somewhat different expressivity, there have been a number of descriptions of how to translate both the syntax and the semantics which have been described elsewhere. One of the recurrent problems, however, stems from the best practices and the syntax of identifiers.

OBO makes use of a numerical, semantics-free identifier and a namespace, with a syntax of NAMESPACE:IDENTIFER. So, a Gene Ontology term looks like GO:0003674. The namespace is not constrained to be two-letters and has mechanisms for world-uniqueness, in that people talk to each other and sort it out, if they clash. The use of a semantics-free identifier means that term names can be changed while maintaining the implied meaning with the term; the label for the term, meanwhile, provides a human readable version, which can be shown to users of the ontology. I will call these the OBO identifier and OBO label respectively.

Translating this, however, into OWL, including Manchester syntax causes significant problems. The naturalistic translation is to turn the OBO identifier onto the identifier in OWL; the OBO namespace would become an XML namespace, the OBO identifier would become an XML identifier. Unfortunately, this doesn’t work. First, the OBO identifier is genuniely just a short string and XML requires a URI; so a mapping between OBO identifiers and URIs is necessary. Second, the OBO identifier is numerical; unfortunately, while the identifiers in OWL can contain numbers they have to start with a non-numerical character. The standard translation, therefore, uses in most cases an OBO wide URL (http://purl.obolibrary.org/obo/), although some ontologies have their own namespace (GO uses http://purl.org/obo/owl/GO#). The OBO identifier is mapping to an valid identifer by sticking a prefix onto the numbers. So, we have identifiers such as GO:GO_0042101 or obo:OBI_1110045. There are also some OBO ontologies for which this does NOT occur; for instance, BFO classes in OBI come out with identifiers of the form snap:Continuant or span:Process, except for one which is bfo:Entity.

Again, all perfectly reasonable, but unfortunately, when converted to Manchester syntax it means that we end up with classes that look like this slightly elided class from OBI:


Class: obo:OBI_1110161

    Annotations:
        rdfs:label "T cell epitope ELISA IL-1b assay"@en,

    SubClassOf:
        obo:OBI_0000661,
        obo:OBI_0000299 some (obo:IAO_0000109
        and (obo:IAO_0000136 some obo:OBI_1110196))

which completely defeats the aim of a human-readable syntax. Now OBO format has much the same problem; relationships to other classes are specified using cross-referenes to their identifiers which are, essentially, unreadable. OBO format works around this with a denormalisation as can be seen from this somewhat elided example from IAO:


[Term]
id: IAO:0000027
name: data item
def:"a data item is an information content entity that is intended...."
is_a: IAO:0000030 ! information content entity

The cross reference in this case is a subsumption link to IAO:0000030

One solution would be to use the rdfs:label in place of the identifier. So, we would have something that looked like this:


Class: "T cell epitope ELISA IL-1b assay" @en

    Annotations:
        obo:identifier "1110161"

    SubClassOf:
        obo:OBI_0000661,
        obo:OBI_0000299 some (obo:IAO_0000109
        and (obo:IAO_0000136 some obo:OBI_1110196))

Other identifiers would also have to be changed, also. I’ve also added the odo:identifier line (which I think would be valid, but might require the creation of an OWL individual). Without this, it would not be possible to go backward.

However, this is problematic as it changes the serializiation between the OWL Manchester syntax and other syntaxes of OWL. The class identifier has to be URI legal, and OBO label here is not. We could do a syntactic conversion (e.g. T%20%cell%20%epitope) but this, again, reduces readiblity, defeating the point. Also, the rdfs:label would become part of the final identifier URI, which then becomes a semantics heavy identifier. Finally, it would require a OBO specific loading of the Manchester syntax, taking the URI identifier from the annotation block, and the rdfs:label from the class name.

So, is there any solution. First, there are tooling solutions. In Protege, it is already possible to use any component of the definition in the display. So, you can set the rdfs:label as the main display form. Tooling solutions are attractive, but there is a problem; you have to extend all tools to support this view; I realise that the number of freaks who wish to edit OWL with emacs is not that large, so this might not seem an issue. However, many people wish to develop ontologies collaboratively using version control; if you want to compare versions you use diff, so we now need an Manchester syntax diff viewer. Also, if you want to do some perl hacking, or straight-forward search and replace, again, it’s all harder.

To some extent this might seem trivial, but then the entire purpose of Manchester syntax (and the functional syntax) is to have an easy to read and manipulate syntax which the XML version of OWL is not. This purpose is defeated if it’s hard to read.

So, a second non-tooling solution. The obvious answer is to take the OBO approach and add comments. Now, the Manchester syntax includes a comment character (#), although last time I tried the Protege parser doesn’t implement this. None then less, it allows this:


Class: obo:OBI_1110161 #"T cell epitope ELISA IL-1b assay"@en

    Annotations:
        rdfs:label "T cell epitope ELISA IL-1b assay"@en,

    SubClassOf:
        obo:OBI_0000661,
        obo:OBI_0000299 some (obo:IAO_0000109
        and (obo:IAO_0000136 some obo:OBI_1110196))

This is not too bad, but it doesn’t work well for complex class expressions. I can’t be bothered to look up the labels and have reused one, but you get something like:


Class: obo:OBI_1110161 #"T cell epitope ELISA IL-1b assay"@en,

    Annotations:
        rdfs:label "T cell epitope ELISA IL-1b assay"@en,

    SubClassOf:
        obo:OBI_0000661, #"T cell epitope ELISA IL-1b assay"@en
        obo:OBI_0000299 #"T cell epitope ELISA IL-1b assay"@en
        some (obo:IAO_0000109 #"T cell epitope ELISA IL-1b assay"@en
        and (obo:IAO_0000136 #"T cell epitope ELISA IL-1b assay"@en
             some obo:OBI_11101 #"T cell epitope ELISA IL-1b assay"@en
             ))

This has three problems. Firstly, we have used comments “meaningfully” as we can’t distinguish between these comments and other normal comments. Secondly, we have had to reformat the output because we have only a “to-end-of-line” comment character. Thirdly, it looks horrible.

So, my minimal solution would be this; we introduce some new comment characters, which are treated as comments normally, but which carry enough semantics to allow a warning when they are wrong; rather like Javadoc, which is a comment wrt the language, but is structured and meaningful wrt the documentation. Tooling could be used to check that the comment masquerading labels are correct wrt to the identifiers.


Class: obo:OBI_1110161 [T cell epitope ELISA IL-1b assay],

    Annotations:
        rdfs:label "T cell epitope ELISA IL-1b assay"@en,

    SubClassOf:
        obo:OBI_0000661 [blah],
        obo:OBI_0000299 [longer blah]
        some (obo:IAO_0000109 [more]
        and (obo:IAO_0000136 [stuff]
        some obo:OBI_11101 [OBI Thing]
        ))

This is still not ideal; it would require extension to Manchester syntax, but it’s minimal, and it does support the semantics free identifiers in OBO in a way which does not require extensive tooling. It’s worth reiterating here that OBOs semantics-free identifiers are a good thing; so, supporting them supports others people who may wish to do the same, sensible thing. It does have the disadvantages of duplicating information, but at least in a way that is checkable.

Comments welcome!

This is the third year in a row that I have been to Neuroinformatics (or it’s forerunner, Databasing the Brain). It’s still turning out to be an enjoyable meeting, even though there is still lots of it that I don’t understand. Come to think of, perhaps because there is lots of it that I don’t understand.

Pilsen (or Plzen) is, perhaps, a strange place for the meeting. It’s a bit of a pig to get to, as the airport is in Prague. Likewise, the conference centre was a bit out of town, so you had to get a taxi if you wanted food in the evening. Still the venue itself worked well. Slightly flaky wireless, but it had tables upstairs on a balcony; a lot of people migrated up there as the meeting went on, making the auditorium a little deserted.

Although, I’ve said I didn’t understand lots of it, many of the keynotes this year were bioinformatics, systems biology or data integration which I know well. As well as that, there was a (semantic) web and ontology section. I enjoyed Tim Clarks talk, as he’s made stuff that lots of people are actually using, although I don’t think he explained why during his talk.

The section of high performance computing was probably the least relevant. While they’ve become interested in power consumption recently, these guys are still obsessed with teraflops (…now petaflops…now exaflops). To be honest, I don’t care. With more power, you can build more granular, higher resolution models, but I doubt that will bring you anything, unless you also have more granular data. They should be worried about discs — always the Cindarella of the hardware world, only slightly more interesting than printers — but it’s discs which carry the data. While we are at it, spinning discs use lots of power. And they have more flashing lights than CPUs. The hardware guys should be talking about disc space. The neuroscientists should be worrying about filling discs up. Neuroinformaticians should make that they end up with an exabyte dataset; not 1000 petabyte datasets or worse, 1,000,000 gigabyte datasets.

I tried to get a bit of Web 2.0 stuff happening at the meeting. David Sutherland set up a friendfeed room. Second day, we were sitting next to each other like two sad blokes at a party full of women, sending each other messages on their iphones. Although, it was a neuroinformatics meeting, so largely without the women. Second day, mostly it was just me, sad, lonely and pathetic. Still, having said that, I did manage to meet almost all of those subscribed to the room, which you couldn’t achieve at ISMB nowadays. Pavan Ramkumar said hello at lunch, and then later at the airport. I met Sarah Maynard at her poster; it had ontologies, OWL and information content-based similarity measures; bound to make me happy. Only Lisa Kjonigsen remained in cyberspace only. With luck, next year, more people will join; not least because I’ll probably not go to Japan.

I had a quick go at live blogging also; to be honest, I am not a natural. The problem is I have too much desire to editorialise. The roboblogger tells me that she just blogs the notes that she would have taken anyway; my notes, on the other hand, are full of comment, invective and questions. Perhaps I could just put these into the asciidoc source of my blog as comments. I stopped live blogging on the last day, not for these reasons, but largely as a desire not to hold my crushing ignorance of the topics being discussed up to public scrutiny.

Neuroinformatics (the meeting) is changing. I have to believe that if there is more about genomic and multiomic data integration that this has to be a good thing. The brain is a hard to thing to figure out; I have to believe that using more data, more types of data and a heavier use of nice, simple, model organisms is going to increase the rate of advance; with all the fuss about systems biology, it’s easy to forget the fabulous success of the last 100 years of reductionism biology, which made systems biology possible. This has to be the way forward for neuroscience. Even if it does make the meeting more usual and, perhaps, less interesting for me as a result.