Ontology Connection Points
In this post, I will describe what I call connection points and explain how they can be used to enable modularity and overcome problems with scalability of reasoning in OWL.
One of the recurrent problems with building ontologies is mission creep; what starts simple rapidly expands until many different areas of the world are described.
I faced this problem recently, when I was asked about the axiomatisation that I described in my paper about function [@url:arxiv.org/abs/1309.5984] Well, the axiomatisation exists, but it was never very complete; so, I thought I should redo it, probably with Tawny-OWL [@url:www.russet.org.uk/blog/2366]
To start off with a simple declaration of function, we might choose something like this:
(defclass Function :subclass (only realisedIn nProcess))
Or, in rough English, a function is something that displays itself only
when involved in a process (the
nProcess is to avoid a name
clash). Now, immediately, we hit the mission-creep problem.
Traditionally, functions have been considered to be some strain of
continuant, and so it might be expected that we would only need to
describe classes that are continuants to define a function. And, yet,
straight away, we have a process. To make this definition meaningful, we
need to distinguish between processes and everything else, and pretty
quickly, our ontology of function requires most of an upper ontology.
This has important consequences. First, if the upper ontology in use is any size at all, or alternatively has a complex axiomatisation, then immediately a lot of axioms have to be reasoned over, and this can take considerable time.
Second, and probably more importantly, the choice of an upper ontology can be quite divisive. We have argued that a single representation for knowledge is neither plausible nor desirable [@url:www.russet.org.uk/blog/1713] limits the ability to abstract, meaning that all of the complexity has to be dealt with all of the time; in essence, an extreme example of mission creep. If, for example, BFO is used, then the representation of entities whose existence we are unsure about becomes difficult. Conversely, if SIO is used, uncertain objects come regardless.
In the rest of this post, I will describe the how we can use the OWL import mechanism to define what I term connection points to work around this problem.
Identifiers and Imports
One of the interesting things about OWL is that, as a web based system, it uses global identifiers in the form of IRIs (or URIs, or URLs, as you wish); I can make statements about your concepts, you can make statements about mine. However, not all OWL ontologies share the same axiom space; this is controlled explicitly, through the OWL import mechanism. In short, while you can make statements about my ontology, I do not have to listen. The practical upshot of this is that it is possible to share identifiers between two ontologies without sharing any axioms, or to share axioms in one direction only.
One nice use of this is with a little upper ontology that I built mostly to try out Tawny, called tawny.upper. This comes in two forms, one in EL profile, and one in DL; the latter has more semantics but is slower to reason over. The DL version imports the EL version but, unusually, introduces no new identifiers at all, it just refines the terms in the EL version with the desired additional semantics. Downstream users can switch between EL and DL semantics by simply adding or removing an OWL import statement.
Alternative forms of import
The ability to share identifiers but not axioms has been used by others, as it provides a partial solution to the problem of big imports. MIREOT [@url:precedings.nature.com/documents/3576/version/1] for example, defines an alternative import mechanism. MIREOT is described as a minimal information standard [@url:precedings.nature.com/documents/3574/version/1] in this it is rather simple, as the minimal information required to reference (identify) an ontology term its identifier and that of its ontology. In practice MIREOT is a set of tools that, at its simplest, involves sharing just the identifier and not the semantics. This can help to reduce the size of an ontology significantly.
An extreme use-case for this would be in our karyotype ontology [@url:arxiv.org/abs/1305.3758] if we wished "human" to refer to the NCBI taxonomy, we could import 100,000s of classes to use one, increasing the size of the ontology by several orders of magnitude in the process. One solution is to just use the identifier and not owl import the NCBI taxonomy.
However, this causes two problems. First, following our example we can
no longer infer that, for example, a Human karyotype is a Mammalian
karyotype; these semantics are present only in the NCBI taxonomy, and we
must import its semantics if we wish to know this; similarly, we would
be free to state that, for example, a human karyotype was also a fly
karyotype. The second problem is that, in tools like Protege, the terms
becomes unidentifiable, because the
rdfs:label for the term has not
been imported, and the NCBI taxonomy uses numeric identifiers.
The MIREOT solution is to extract a subset of the axioms in the upstream ontology, and then import these; obvious subsets would be all the labels of terms used in a downstream ontology, although MIREOT uses a slightly more complex system [@url:precedings.nature.com/documents/3576/version/1] This would solve the problem of terms being unidentifiable; still, though, human would not be known to be mammalian. Another subset would be all terms from mammal downwords (with their labels). Now, human would be known to be a mammal, but not known to not be a fly. As you increase the size of the subset, you increase the inferences that you can make, but the reasoning process will get slower.
From my perspective, the second of these seems sensible; large ontologies reason slowly and there is no way around this, until reasoner technology gets better. For this reason, I will probably implement something similar in tawny (with an improvement suggested later). The first, however, seems less justified. We are effectively duplicating all the labels in the upstream ontology, with all this entails, for the purpose of display; we can minimise these problems, by regularly regenerating the imported subset from the source ontology regularly, but this is another task that needs to be done.
Tawny is less affected by this from the start, since the name that a developer uses can exist only in Clojure space; more over, when displaying documentation, tawny can use data from any ontologies, rather than those imported into the current ontology. We do not need to duplicate the MIREOT subset, we just need to know about it.
While MIREOT is a sensible idea, it is nonetheless seen as a workaround, a compromise solution to a difficult problem [@url:precedings.nature.com/documents/3574/version/1] However, in this section, I will discuss a simpler, and more general solution that helps to address the problem of modularity.
Consider, a reworked version of the definition above, with one critical
nProcess term is now referencing an independent Clojure
namespace. The generated OWL from this ontology will include
simply as a reference.
(defclass Function :subclass (only realisedIn connection.upper/nProcess))
This is different from the MIREOT approach which maintains that the minimal information is the identifier for the term and the identifier for the ontology. In this case, we only have the former. This difference is important, as I will describe later.
In one sense, we have achieved something negative. We now have a term in our function ontology, with no semantics or annotations. Oops [@url:oeg-lia3.dia.fi.upm.es/oops/index-content.jsp] has this in their catalogue of ontology errors:
P8. Missing annotations: ontology terms lack annotations properties. This kind of properties improves the ontology understanding and usability from a user point of view.
However, this problem can be fixed by the editing environment; and,
indeed, using Tawny it is. We have a meaningful name, despite a
meaningless identifier, and we can see the definition of
should we choose. I call these form of references connectors, and they
have interesting properties. In this case, using
nProcess is a
required connector. The function ontology needs it to have its full
semantic meaning, but it is not provided.
So, let us consider how we might use these connection points. First, for this example, we need a small upper ontology; in this case, I use the simplest possible ontology to demonstrate my point.
(defontology upper) (as-disjoint (defclass nProcess) (defclass NotProcess))
Now, considering our function definition earlier; imagine that we wish
to use this in a downstream ontology to define some functions. In this
case, we define a child of
Function which is
NotProcess. The simplest possible way of doing this is to use
all three of the entities (
required connection points. We import no other ontologies here, so we
can infer nothing that is not already stated.
(defontology use-one) (defclass FunctionChild :subclass connection.function/Function (owl-some connection.function/realisedIn connection.upper/NotProcess))
In our second use, we now import our function ontology. At this point,
the value of the shared identifier space starts to show its value; we
now understand the semantics of our
Function term because it uses the
same identifier as the term in the function ontology.
This does, now, allow us to draw an additional inference; any individual
FunctionChild must be
realisedIn an instance of
which, itself, we can infer to be a child of
Process because the
function ontology claims this. Or, in short,
cannot be disjoint, if our ontology is to remain coherent. This ontology
remains coherent, however, because we have not imported the upper
(defontology use-two) (owl-import connection.function/function) ;; this ontology looks much the same as use-one (defclass FunctionChild :subclass connection.function/Function (owl-some connection.function/realisedIn connection.upper/NotProcess))
In the final use, we import both ontologies. The function import allows
us to conclude that
Process cannot be disjoint, while
out upper ontology tells us that they are, and at this point, our
ontology becomes incoherent. The required connection point in the
function ontology has now been provided by term in our upper ontology.
(defontology use-three) (owl-import connection.function/function) (owl-import connection.upper/upper) (defclass FunctionChild :subclass connection.function/Function (owl-some connection.function/realisedIn connection.upper/NotProcess))
The critical point is that while the function ontology references some term in its definition, the exact semantics of that term are not specified. These semantics are at the option of the downstream user of function ontology; in use-three, we have decided to fully specify these semantics. But we could have imported a totally different upper ontology had we chosen, either using the same identifiers, or through a bridge ontology making judicious use of equivalent/sameAs declarations. In short, the semantics has become late binding.
We can use this technique to improve on MIREOT. Instead of importing our derived ontology, we can now use connection points instead. The karyotype ontology can reference the NCBI taxonomy, and leave the end user to choose the semantics they need; if the user wants the whole taxonomy, and is prepared to deal with the reasoning speed, then have this option. This choice can even be made contextually; for example, an OWL import could be added on a continuous integration platform [@url:www.russet.org.uk/blog/2324] when reasoning time is less important, but not during development or interactive testing.
While the idea of connection points seems sound, it has some difficulties; one obvious problem is that the developer of an ontology must choose the modules, with connection points for themselves. We plan to test this using SIO; we have already been working on a tawnyified version of this, to enable investigation of pattern-driven ontology development. We will build on this work by attempting to modularise the ontology, with connection points between them.
Currently, the use of this form of connection points adds some load to
the downstream ontology developer. It would be relatively easy for a
developer to build an ontology like use-one or use-two above by mistake,
accidentally forgetting to add an OWL import. Originally, when I built
tawny, I wanted to automate this process --- a Clojure import would mean
an OWL import, but decided against it; obviously this was a good thing
as it allows the use of connection points. I think we can work around
this by adding formal support for connection points, so that for
example, the function ontology can declare that
nProcess needs to be
defined somewhere, and to issue warnings if it it is not.
In this post, I have addressed the problem of ontology modularity and described the use of connection points, enabling a form of late binding. In essence, we achieve this by building on OWLs web nature --- shared identifiers do not presuppose shared semantics, in different ontologies. While further investigation is needed, this could change the nature of ontology engineering, allowing a more modular, more scalable and more pragmatic form of development.
Thanks to Allyson Lister and James Malone for reviewing this article.