Archive for the ‘Ontology’ Category

I’m winding by way back from a busy month with both Bio-Ontologies and ICBO, but in general I think the experience has been really positive, even if interspersing holiday and work travel has rather exhausted me. But both were in Europe and Bio-Ontologies was right next door, so I did not want to waste the opportunity.

I have a long history with Bio-Ontologies, having been a chair for many years and a informal helper before that. We steered it from an informal meeting, to having a proper programme committee, proceedings and much of the structure that it has now. I bumped into Steven Leard at the meeting, and was rather shocked to realise that the first meeting I helped out at was 14 years ago.

Strangely, though, since my last time as a chair, five or six years ago, I’ve never been once. For a few years, of course, this was quite deliberate; I was so fed up with travelling at that time of year, that I really enjoyed the rest. But since then, it has been happenstance, rather than a deliberate decision. So, it felt like a bit of a home-coming, and even if I have seen many of the people at different conferences on different occasions. Mark Musen gave a interesting keynote: I was, at the time, rather unconvinced by this hypothesis that we don’t spend enough time arguing (I mean, ontologists, really?). A more nuanced reading of what he said though, is that we should assess and re-assess our practices against the evidence of our experience. I cannot help but agree with this, and it has made me think again. More on that later, perhaps.

It was nice to go to Dublin, also, as it was my first time. Nice city, deeply integrated with it’s river. We had some nice feed, in some good resturants and cafes, and a blissful absence of Irish theme pubs. The conference venue was good also, even if it does look like a vacuum cleaner from outside.

ICBO was a different kettle of fish, though. At four days (many of the delegates go for the whole thing) it’s long, and I felt rather stretched by the end (I’m on the plane home now, after a very early start, which might be colouring my vision). This does give plenty of time for slightly longer and more detailed presentations; the workshops were small, intense and full of discussion. Likewise, the poster and demo sessions. I rather blitzed the conference with Tawny-OWL (http://www.russet.org.uk/blog/3030), and Lentic (http://www.russet.org.uk/blog/3035). In total, I gave 1 tutorial; 1 paper; 1 demo; 1 flash update on the demo and 1 feedback session on the tutorial. People seemed genuinely sympathetic and a little sad when my cute Tawny-OWL logo went 404 during the flash update. For those who missed it, the logo is online, as is the logo for lentic which lacks in cuteness, but is rather more dramatic.

I got some good feedback, was surprised to win the best demo session (I mean, it was entirely text running in Emacs, and very laggy, running on my 5 year-old netbook). The second place was James Overton’s Robot. I am told between the two of us we got a very large percentage of the vote. I think this is an interesting result, because it strongly suggests to me that, for ICBO attendees there is disatisfaction over current tooling. Ontologies are being more programmatically developed and I cannot help but feel that this is the future.

I thought I had never been to Lisbon before, but on getting there I realised I had been, about 20 years ago; the story of is long, and not that interesting so I will skip describing it here. This time I had a better look and I will not forget again. Lisbon is very nice city indeed; while it’s architectural elegance may not be quite up there with Rome (or even Milan), it’s certainly not far behind, but as a city built into and with its geography it is stunning.

In summary, an interesting month from an ontology perspective and one that I enjoyed very much. While I might have wanted for something a little less hectic (especially, as I interspersed my holidays (http://www.russet.org.uk/blog/3091)), it has left me with the sense that ontologies are both a productive part of the bioinformatics environment and a sense that there is more to come.

Bibliography


Abstract

Bio-medical ontologies can contain a large number of concepts. Often many of these concepts are very similar to each other, and similar or identical to concepts found in other bio-medical databases. This presents both a challenge and opportunity: maintaining many similar concepts is tedious and fastidious work, which could be substantially reduced if the data could be derived from pre-existing knowledge sources. In this paper, we describe how we have achieved this for an ontology of the mitochondria using our novel ontology development environment, the Tawny-OWL library.

  • Jennifer D. Warrender
  • Phillip Lord

Plain English Summary

Ontologies allow complex descriptions of the world in a way that is both precise and computationally amenable — that is, computers can be used to check and query these descriptions. The mitochondria is a critical part of the cells of most organisms, being responsible for energy usage. We wished to build an ontology describing the current research on the mitochondria.

The more traditional approach to this, would have been to build the ontology from scratch; but many parts of the mitochondria, including the genes and proteins have already been described in other databases. Building from scratch on the basis of the data in these databases would be time-consuming, but also sensitive to change — if the database changes, our ontology would need updating too.

Instead we have used our new ontology development methodology to automatically extract this knowledge, and build the ontology for us providing what we describe as the scaffold for an ontology. In future, we will add more knowledge to this ontology, slowing building up the rich description of the mitochondrion that we are aiming for.


Abstract

Ontology development relates to software development in that they both involve the production of formal computational knowledge. It is possible, therefore, that some of the techniques used in software engineering could also be used for ontologies; for example, in software engineering testing is a well-established process, and part of many different methodologies. The application of testing to ontologies, therefore, seems attractive. The Karyotype Ontology is developed using the novel Tawny-OWL library. This provides a fully programmatic environment for ontology development, which includes a complete test harness. In this paper, we describe how we have used this harness to build an extensive series of tests as well as used a commodity continuous integration system to link testing deeply into our development process; this environment, is applicable to any OWL ontology whether written using Tawny-OWL or not. Moreover, we present a novel analysis of our tests, introducing a new classification of what our different tests are. For each class of test, we describe why we use these tests, also by comparison to software tests. We believe that this systematic comparison between ontology and software development will help us move to a more agile form of ontology development.

  • Jennifer D. Warrender
  • Phillip Lord

Plain English Summary

Ontologies are a mechanism for representing parts of the world computationally. They allow you to describe the world in a complex way, and then query over it repeatable and consistently. However, ontologies are complex and are themselves hard to build consistently and repeatably. If the ontology is built incorrectly, then queries will give the wrong answers also.

Software is also complex and over the years, software engineers have developed many techniques for building software so that it, too, is correct. While these do not always succeed, they have allowed us to produce software that is vastly more complex than in years past. One important technique is automated testing. Here software can be run to ensure that it is behaving correctly automatically and often. To do this, we use one piece of software to test another.

We have borrowed the same technology for use with ontologies; while this has been done before, our use of commodity testing software has allowed us to scale up the tests significantly, and we describe this approach in this paper. However, while they have many similarities, ontologies are not software. The sort of tests that we need for ontologies may be different from those that we need for software. In this paper, we also describe the kinds of tests that we have used for the karyotype ontology (1305.3758), and which are probably relevant to other ontology development efforts too.

Overall, this should increase our understanding of how to build ontology tests and ontologies.

Bibliography

I was entertained to see the recent publication of a new paper on the definition of function (10.1186/2041-1480-5-27). I met one of the authors at a meeting a few years back in Durham, and had a very nice discussion about my own contribution to this definition which I published previously (1309.5984).

I do not want to discuss the paper in full, which is a nice paper and worth a read. I do however want to comment more specifically about the parts that explicitly and implicitly address my own paper.

At the start of the paper, the authors discuss the criteria for their definition which includes this:

Avoidance of epiphenomenalism: Functions should be determined by current performance of its bearer, not mainly by causally inert historical facts like its (evolutionary or cultural) history or a mere ascription by its producers, users, or observers

I found this a fairly strange criteria; it’s not clear to me why historical facts are inert; especially in biology the evolutionary history of an organism is surely one of the most important features. Originally, this criteria comes from another paper by Artiga who says:

We want to find out what is the lung’s function, we would probably look at what lungs actually do in our body. We would see that they enable respiration, so we would conclude that this is their function. Why they came to be here seems completely irrelevant for function attribution.

Obviously, this means “most peoples” bodies rather than just one, given that lungs do (somewhat) different things in different people. But, I do not think that why they came to be here is irrelevant, at least not if we wish to distinguish with a role. My fingers are currently engaged in typing, but few people would describe this as a function (although most would say that precise and controlled manipulation of the world is). Or to make a more extreme position, after Robert Hoehndorf, the heart actually does produce loud thumping noises. Surely not a function?

I am also slightly disappointed that what I think is one of the key points of my own function paper has been missed from their list of criteria. In it, I say:

I consider whether these definitions are applicable; for a given set of entities how do we decide whether we have a function (of either subclass) or a role.

Given a definition, I should be able to produce at least one practical test that I can use to determine whether that definition holds; I think that this notion of applicability needs to be more widely considered.

Now, my actual definition of biological function was:

A biological function is a realizable entity that inheres in a continuant which is realized in an activity, and where the homologous structure(s) of individuals of closely related and the same species bear this same biological function.

The language has been chosen to mirror BFO since it was in this context that the paper was addressed; I think it could be simplified and made more readable, but I was constrained by the language of BFO. Now, the first criticism on my definition is on technical grounds namely:

Lord claims that his definition is recursive rather than circular, despite the occurrence of the word “function” in the definiens.

My use of this form of definition was, of course, deliberate and partly provocative; perhaps, it is something that I should not have done, since it has muddied the water somewhat as this comment shows. In fact, it is very easy to work around this criticism by simply removing the recursion:

A biological function is a …. same species bear this same realizable entity.

The technical criticism has now gone. But I do not like the definition as much because “the same realizable entity” would in fact be a biological function. I think we avoid recursive definitions because they can be circular, but this is like avoiding recursive function calls because they may not terminate. And that is a shame, because, as with recursive function calls, I think this form of definition can be quite succinct. Consider:

A spouse is a person who is married to their spouse.

or:

A brother is a man with the same parents as their brother.

If we unwind the recursion, then we get

A brother is a man with the same parents as another man.

Again, we are hiding that reality that both men in this definition are brothers.

Of course, some recursive definitions might actually be circular, and that is less good. But if the applicability of a function is also considered then this issue goes away. I can determine if some one is a spouse or a brother given these definitions, so I see no problem.

A second criticism comes from my statment that:

Hence he concludes that among the instances of realizables that are realizables for the same type of process can be both roles and functions depending on the species the realizable’s bearer belongs to. This presents a problem for the distinction between functions and roles.

I do not think that this is a problem at all, because I say quite clearly that we can distinguish between roles and functions, but that we do this for the individual role or function not at a class level:

My definition distinguishes between the two based on the nature of the relationship to the independent continuant in which they inhere. I suggest that it is very hard to make the distinction at the class level[…]. For an individual continuant bearing a realizable entity, this distinction appears to be much more straightforward.

In otherwords, “for walking on” is either a role or a function. But in human hands it is a role, while for chimps it is a function. I see no reason why the distinction at the level of the individual should be considered to be less relavant than at the class, nor why this should be problematic. Actually, it reduces the need for duplication between the role and function hierarchies; while tools like Tawny-OWL (1303.0213) may ease the maintainence of duplication, avoiding altogether still seems sensible.

The final criticism is, I think, the least worrisome. The authors say:

Had evolution stopped after the first species, according to Lord’s definition, there would not have been any biological function at all.

The slightly flippant but none the less entirely valid argument to this is, “but it didn’t”. We could equally argue against a definition of human as having two hands on the basis that they might have evolved a third.

More importantly, though, in most definitions of life the ability to adapt or evolve is part of the definition. Without this, we have a chemical process. So, without evolution, we have no life. Given this, we can rewrite the last statement as:

Had life stopped after the first species, there would not have been any biological function at all.

Which is an entirely true statement; that it drops so nicely out of my definition for biological function is a strength of my definition and not a weakness.

I feel that my definition is still a good one. Rereading my function paper now the argument still seems coherent, and the examples clear. Although I put an entire section on applicability into the function, I do rather regret that I did not introduce it as a general criteria for all ontology definitions explicitly; that this criteria has been missed is surely my fault and not the readers. Perhaps I should have spent more time on that, than on my recursive definition which was not critical to the paper.

At the same time, the fact that discussions on definitions are still going on, for a term that biologists have been using for many years again leads me back to the conclusion that the definitions of such generic terms are not nearly as important as some make out. So long as they are useful, biologists will carry on describing things as functions if it fits their ad-hoc, informal definitions that have been developed over time within a community. I cannot help but think that this is a good thing.

Bibliography

Before commit eb2f0e04, I used to have this function in tawny.owl.

(defbdontfn
  add-subclass
  {:doc "Adds one or more subclass to name in ontology."
   :arglists '([name & subclass] [ontology name & subclass])}
  [o name subclass]
  (add-axiom o
             (.getOWLSubClassOfAxiom
              (owl-data-factory)
              (ensure-class o name)
              (ensure-class o subclass))))

The idea is, as the name suggests to add a subclass relationship to the ontology; on the face of it, everything looks fine. However, a closer look at the OWL API raises a question:

getOWLSubClassOfAxiom(OWLClassExpression subClass, OWLClassExpression superClass)

The subclass parameter in Clojure maps to the superClass parameter in Java. The subclass in Clojure is actually the superclass.

If we compare the property equivalent in Tawny, things seem more regular:

(defbdontfn add-superproperty
  "Adds all items in superpropertylist to property as
a superproperty."
  [o property superproperty]
  (add-axiom o
             (.getOWLSubObjectPropertyOfAxiom
              (owl-data-factory)
              (ensure-object-property o property)
              (ensure-object-property o superproperty))))

and the equivalent Java:

getOWLSubObjectPropertyOfAxiom(OWLObjectPropertyExpression subProperty,
                               OWLObjectPropertyExpression superProperty)

The names of the parameters are now the same way around in Clojure and Java. So, have I made a mistake in Tawny with subclass handling? Actually, no, because we get strangeness at a different point with properties; consider the object-property-handlers which map between frames and the functions which implement them:

(def ^{:private true} object-property-handlers
  {
   :domain add-domain
   :range add-range
   :inverse add-inverse
   :subproperty add-superproperty
   :characteristic add-characteristics
   :subpropertychain add-subpropertychain
   :disjoint add-disjoint-property
   :equivalent add-equivalent-property
   :annotation add-annotation
   :label add-label
   :comment add-comment})

So, the :subproperty: frame is implemented with the add-superproperty function. As might be expected, :subclass is implemented with add-subclass

Even without this oddness, the problem can be seen when considering just the add-* functions. Consider, add-label:

(defbmontfn add-label
  "Add labels to the named entities."
  [o named-entity label]
  (add-annotation
   o
   named-entity
   [(tawny.owl/label label)]))

The semantics of this are that the third argument, label, is added to the second, named-entity as a label. It is slightly more complex than this; the b in defbmontfn means broadcast — add-label is actually variadic and flattens meaning that any number of labels can be added.

With add-subclass the semantics are reversed; the second argument becomes a subclass of the third (or, again, because of broadcasting, the third or subsequent arguments). And add-subclass is inconsistent here — all of the other add-* have the same semantics as add-label.

So, clearly, both add-subclass and the :subproperty frame have problems, and are not consistent with the rest of the API. Two important parts of Tawny-OWL have been implemented backward. How did this happen?


Investigating Manchester Syntax

We can investigate this further, by considering another inconsistency with Tawny. Considering the object-property-handlers above, we can see that while :subproperty is implemented with add-superproperty, :subpropertychain is implemented with add-subpropertychain.

The slot names in Tawny come (nearly) directly from Manchester syntax; so, let us compare Manchester syntax with the functional syntax for sub-properties and sub-property chains, using the OWL Primer. In Manchester syntax:

ObjectProperty: hasFather
   SubPropertyOf: hasParent

In functional syntax:

SubObjectPropertyOf(
   :hasFather
   :hasParent
)

Compare this to the equivalent declaration for subproperty chain.

ObjectProperty: hasGrandparent
   SubPropertyChain: hasParent o hasParent

Or in functional syntax:

SubObjectPropertyOf(
   ObjectPropertyChain( :hasParent :hasParent )
   :hasGrandparent
 )

The filler for SubPropertyChain: comes first, while for SubProperty: is comes second.

This suggests that the SubPropertyOf: and SubPropertyChain: frames are back-to-front from each other (this is the values of the slots appear in different orders in the two syntaxes). So, with the former, SubPropertyOf: I am stating that the entity (hasFather) is related to the filler (hasParent) and that the filler (hasParent) is the super property. With the latter, SubPropertyChain: I am stating that the entity (hasGrandparent) is related to the filler (hasParent o hasParent) and that the filler (hasParent o hasParent) is the sub property.

So, the two appear to be inconsistent with each other. So, let’s consider a further analysis of the other slots. Consider, for example:

A
  Annotations: rdfs:label B

which means B is an annotation of A.

A
 EquivalentTo: B

means B is equivalent to A (or, in this case, that A is equivalent to B as equivalance is symmetrical).

A
  Domain: B

means B is a domain of A

A
  Type: B

means B is a type of A.

All of these are consistent with each other: the filler (B) has a relationship to the entity (A) which is defined by the slot (type), with the caveat that the EquivalentTo relationship is symmetric.

Now

A
  SubClassOf: B
A
  SubPropertyOf: B

are backward: the entity (A) has a relationship to the filler (B) defined by the slot (SubClassOf:, SubPropertyOf:) – it’s why the Of preposition has been added. It is not possible to add the same preposition to the other slots; although it is possible to add has to the beginning. So, for example, the natural language semantics of these statements preserves their OMN meaning:

A HasAnnotation: B
A HasType: B
A HasKey: B

Of these, only the latter is actually OMN. The only other slots with prepositions are EquivalentTo and SameAs — you could change these to has as well.

A HasEquivalent: B
A HasSame: B

This probably reduces the readability over all, but it does at least maintain the semantics. It is for this reason that I say SubClassOf: is backward; to be consistent, it should be Super:

So

A Super: B

means B is a superclass of A. Now, we could add the has preposition to the start, while preserving the natural language semantics.

A HasSuper: B

Everything that I have said here is also true of SubPropertyOf: which behaves in the same way as SubClassOf: (i.e. backwards wrt to most slots).

Going back to the very early question, SubPropertyChain: (note, not SubPropertyChainOf:) is the same way around as most slots and the opposite way around from SubPropertyOf:

A SubPropertyChain: B o B

could be replaced with

A HasSubPropertyChain: B o B

In summary, for Manchester syntax SubClassOf: and SubPropertyOf: frames are backward with respect to all the other frames.


The Implications for Tawny

Unfortunately, the situation in Tawny-OWL was slightly worse than for Manchester syntax. While writing an early version of the karyotype ontology (1305.3758) by hand, I found typing too hard so removed the prepositions (:subclass and not :subclassof). Combined with the lack of CamelCase, this seemed a cleaner syntax. But it has exacerbated the issues described here.

Although, I have become aware of this problem before the release of the first full version of Tawny, I decided that consistency with Manchester syntax was worth the hassle. My recent experiments with literate ontologies (http://www.russet.org.uk/blog/2979), however have made me realise that I could not leave the situation as it is. One key feature of Tawny is that it (normally) forces declaration of entities before use which avoids simple spelling mistakes common when writing Manchester syntax by hand. However, only having access to a :subclass slot means that ontologies must be declared from the top of the inheritance hierarchy downward. For a literate ontology, this restriction seems unnecessary, and places an unfortunate emphasis on the upper ontology. I would like also to be able to build from the bottom up.

Neither having the semantics of add-subclass backward, nor the :subproperty add-superclass solution work well as it stands, and extending this to a :superclass slot would make the situation worse. In short, the only sensible fix was to diverge from OWL Manchester syntax, and deprecate the use of :subclass and :subproperty. At the same time, I decided to remove some extra typing. Therefore, :subclass has become :super (shortening and reversing the natural language semantics, retaining the logical semantics), and the new slot :sub has been added. Likewise, :subproperty has become :super and a new slot :sub introduced for properties also. As well as avoiding extra typing, removing the suffix has meant that I can leave :subclass and :subproperty in place but deprecated; the alternative of just reversing their semantics seemed unfortunate. Only the semantics of add-subclass has been broken, being reversed.

The inconsistency with Manchester syntax is currently a little painful, especially as the :subclass slot has been around since the early days of Tawny (http://www.russet.org.uk/blog/2214). The advantage, however, is that I have a simple rule to remember: A :s B means “A has :s B” or equivalently, “B is :s of A“. For this reason, and because it paves the way for richer literate ontologies, I feel that this is a good change.

Bibliography

In this post, I will describe what I call connection points and explain how they can be used to enable modularity and overcome problems with scalability of reasoning in OWL.

One of the recurrent problems with building ontologies is mission creep; what starts simple rapidly expands until many different areas of the world are described.

I faced this problem recently, when I was asked about the axiomatisation that I described in my paper about function (1309.5984). Well, the axiomatisation exists, but it was never very complete; so, I thought I should redo it, probably with Tawny-OWL (http://www.russet.org.uk/blog/2366).

To start off with a simple declaration of function, we might choose something like this:

(defclass Function
  :subclass (only realisedIn nProcess))

Or, in rough English, a function is something that displays itself only when involved in a process (the n in nProcess is to avoid a name clash). Now, immediately, we hit the mission-creep problem. Traditionally, functions have been considered to be some strain of continuant, and so it might be expected that we would only need to describe classes that are continuants to define a function. And, yet, straight away, we have a process. To make this definition meaningful, we need to distinguish between processes and everything else, and pretty quickly, our ontology of function requires most of an upper ontology.

This has important consequences. First, if the upper ontology in use is any size at all, or alternatively has a complex axiomatisation, then immediately a lot of axioms have to be reasoned over, and this can take considerable time.

Second, and probably more importantly, the choice of an upper ontology can be quite divisive. We have argued that a single representation for knowledge is neither plausible nor desirable (http://www.russet.org.uk/blog/1713) — this limits the ability to abstract, meaning that all of the complexity has to be dealt with all of the time; in essence, an extreme example of mission creep. If, for example, BFO is used, then the representation of entities whose existence we are unsure about becomes difficult. Conversely, if SIO is used, uncertain objects come regardless.

In the rest of this post, I will describe the how we can use the OWL import mechanism to define what I term connection points to work around this problem.


Identifiers and Imports

One of the interesting things about OWL is that, as a web based system, it uses global identifiers in the form of IRIs (or URIs, or URLs, as you wish); I can make statements about your concepts, you can make statements about mine. However, not all OWL ontologies share the same axiom space; this is controlled explicitly, through the OWL import mechanism. In short, while you can make statements about my ontology, I do not have to listen. The practical upshot of this is that it is possible to share identifiers between two ontologies without sharing any axioms, or to share axioms in one direction only.

One nice use of this is with a little upper ontology that I built mostly to try out Tawny, called tawny.upper. This comes in two forms, one in EL profile, and one in DL; the latter has more semantics but is slower to reason over. The DL version imports the EL version but, unusually, introduces no new identifiers at all, it just refines the terms in the EL version with the desired additional semantics. Downstream users can switch between EL and DL semantics by simply adding or removing an OWL import statement.


Alternative forms of import

The ability to share identifiers but not axioms has been used by others, as it provides a partial solution to the problem of big imports. MIREOT (http://precedings.nature.com/documents/3576/version/1), for example, defines an alternative import mechanism. MIREOT is described as a minimal information standard (http://precedings.nature.com/documents/3574/version/1); in this it is rather simple, as the minimal information required to reference (identify) an ontology term its identifier and that of its ontology. In practice MIREOT is a set of tools that, at its simplest, involves sharing just the identifier and not the semantics. This can help to reduce the size of an ontology significantly.

An extreme use-case for this would be in our karyotype ontology (1305.3758); if we wished “human” to refer to the NCBI taxonomy, we could import 100,000s of classes to use one, increasing the size of the ontology by several orders of magnitude in the process. One solution is to just use the identifier and not owl import the NCBI taxonomy.

However, this causes two problems. First, following our example we can no longer infer that, for example, a Human karyotype is a Mammalian karyotype; these semantics are present only in the NCBI taxonomy, and we must import its semantics if we wish to know this; similarly, we would be free to state that, for example, a human karyotype was also a fly karyotype. The second problem is that, in tools like Protege, the terms becomes unidentifiable, because the rdfs:label for the term has not been imported, and the NCBI taxonomy uses numeric identifiers.

The MIREOT solution is to extract a subset of the axioms in the upstream ontology, and then import these; obvious subsets would be all the labels of terms used in a downstream ontology, although MIREOT uses a slightly more complex system (http://precedings.nature.com/documents/3576/version/1). This would solve the problem of terms being unidentifiable; still, though, human would not be known to be mammalian. Another subset would be all terms from mammal downwords (with their labels). Now, human would be known to be a mammal, but not known to not be a fly. As you increase the size of the subset, you increase the inferences that you can make, but the reasoning process will get slower.

From my perspective, the second of these seems sensible; large ontologies reason slowly and there is no way around this, until reasoner technology gets better. For this reason, I will probably implement something similar in tawny (with an improvement suggested later). The first, however, seems less justified. We are effectively duplicating all the labels in the upstream ontology, with all this entails, for the purpose of display; we can minimise these problems, by regularly regenerating the imported subset from the source ontology regularly, but this is another task that needs to be done.

Tawny is less affected by this from the start, since the name that a developer uses can exist only in Clojure space; more over, when displaying documentation, tawny can use data from any ontologies, rather than those imported into the current ontology. We do not need to duplicate the MIREOT subset, we just need to know about it.


Connection Points

While MIREOT is a sensible idea, it is nonetheless seen as a workaround, a compromise solution to a difficult problem (http://precedings.nature.com/documents/3574/version/1). However, in this section, I will discuss a simpler, and more general solution that helps to address the problem of modularity.

Consider, a reworked version of the definition above, with one critical change. The nProcess term is now referencing an independent Clojure namespace. The generated OWL from this ontology will include nProcess simply as a reference.

(defclass Function
  :subclass (only realisedIn
                  connection.upper/nProcess))

This is different from the MIREOT approach which maintains that the minimal information is the identifier for the term and the identifier for the ontology. In this case, we only have the former. This difference is important, as I will describe later.

In one sense, we have achieved something negative. We now have a term in our function ontology, with no semantics or annotations. Oops (http://oeg-lia3.dia.fi.upm.es/oops/index-content.jsp) has this in their catalogue of ontology errors:

P8. Missing annotations: ontology terms lack annotations properties. This kind of properties improves the ontology understanding and usability from a user point of view.

— OOPS

However, this problem can be fixed by the editing environment; and, indeed, using Tawny it is. We have a meaningful name, despite a meaningless identifier, and we can see the definition of nProcess should we choose. I call these form of references connectors, and they have interesting properties. In this case, using nProcess is a required connector. The function ontology needs it to have its full semantic meaning, but it is not provided.

So, let us consider how we might use these connection points. First, for this example, we need a small upper ontology; in this case, I use the simplest possible ontology to demonstrate my point.

(defontology upper)

(as-disjoint
 (defclass nProcess)
 (defclass NotProcess))

Now, considering our function definition earlier; imagine that we wish to use this in a downstream ontology to define some functions. In this case, we define a child of Function which is realisedIn something which is NotProcess. The simplest possible way of doing this is to use all three of the entities (Function, realisedIn and NotProcess) as required connection points. We import no other ontologies here, so we can infer nothing that is not already stated.

(defontology use-one)

(defclass FunctionChild
  :subclass connection.function/Function
  (owl-some connection.function/realisedIn
            connection.upper/NotProcess))

In our second use, we now import our function ontology. At this point, the value of the shared identifier space starts to show its value; we now understand the semantics of our Function term because it uses the same identifier as the term in the function ontology.

This does, now, allow us to draw an additional inference; any individual of FunctionChild must be realisedIn an instance of NotProcess which, itself, we can infer to be a child of Process because the function ontology claims this. Or, in short, NotProcess and Process cannot be disjoint, if our ontology is to remain coherent. This ontology remains coherent, however, because we have not imported the upper ontology.

(defontology use-two)
(owl-import connection.function/function)

;; this ontology looks much the same as use-one
(defclass FunctionChild
  :subclass connection.function/Function
  (owl-some connection.function/realisedIn
            connection.upper/NotProcess))

In the final use, we import both ontologies. The function import allows us to conclude that NotProcess and Process cannot be disjoint, while out upper ontology tells us that they are, and at this point, our ontology becomes incoherent. The required connection point in the function ontology has now been provided by term in our upper ontology.

(defontology use-three)
(owl-import connection.function/function)
(owl-import connection.upper/upper)

(defclass FunctionChild
  :subclass connection.function/Function
  (owl-some connection.function/realisedIn
            connection.upper/NotProcess))

The critical point is that while the function ontology references some term in its definition, the exact semantics of that term are not specified. These semantics are at the option of the downstream user of function ontology; in use-three, we have decided to fully specify these semantics. But we could have imported a totally different upper ontology had we chosen, either using the same identifiers, or through a bridge ontology making judicious use of equivalent/sameAs declarations. In short, the semantics has become late binding.

We can use this technique to improve on MIREOT. Instead of importing our derived ontology, we can now use connection points instead. The karyotype ontology can reference the NCBI taxonomy, and leave the end user to choose the semantics they need; if the user wants the whole taxonomy, and is prepared to deal with the reasoning speed, then have this option. This choice can even be made contextually; for example, an OWL import could be added on a continuous integration platform (http://www.russet.org.uk/blog/2324) when reasoning time is less important, but not during development or interactive testing.


Future Work

While the idea of connection points seems sound, it has some difficulties; one obvious problem is that the developer of an ontology must choose the modules, with connection points for themselves. We plan to test this using SIO; we have already been working on a tawnyified version of this, to enable investigation of pattern-driven ontology development. We will build on this work by attempting to modularise the ontology, with connection points between them.

Currently, the use of this form of connection points adds some load to the downstream ontology developer. It would be relatively easy for a developer to build an ontology like use-one or use-two above by mistake, accidentally forgetting to add an OWL import. Originally, when I built tawny, I wanted to automate this process — a Clojure import would mean an OWL import, but decided against it; obviously this was a good thing as it allows the use of connection points. I think we can work around this by adding formal support for connection points, so that for example, the function ontology can declare that nProcess needs to be defined somewhere, and to issue warnings if it it is not.


Conclusions

In this post, I have addressed the problem of ontology modularity and described the use of connection points, enabling a form of late binding. In essence, we achieve this by building on OWLs web nature — shared identifiers do not presuppose shared semantics, in different ontologies. While further investigation is needed, this could change the nature of ontology engineering, allowing a more modular, more scalable and more pragmatic form of development.


Acknowledgements

Thanks to Allyson Lister and James Malone for reviewing this article.

Bibliography