Semantics-Free Ontologies

2012-04-05

In this article, I consider the problems of semantics-free identifiers in OWL and suggest another (possible) solution to the problem.

The problems of identifiers and their semantics are not new. I have written about these problems previously in the context of: blog permalinks [@url:www.russet.org.uk/blog/2011/05/permalink-semantics/] and with conversion between OBO format and Manchester syntax [@url:www.russet.org.uk/blog/2009/09/obo-format-and-manchester-syntax/] The basic issue is one of choosing your compromise. Identifiers with semantics in them (which this blog uses although I wish it did not) are considerably more human readable, but are not resiliant to change, as the semantics in the identifiers can become out of date with respect to the content they describe. But neither compromise is entirely satisfactory; we need a more pragmatic approach [@url:robertdavidstevens.wordpress.com/2011/05/26/unicorns-in-my-ontology]

Recently, I was looking at the move of the OBI ontology [@doi:10.1186/2041-1480-1-S1-S7] from BFO 1.0 to BFO 2.0. I have commented extensively on BFO before [@doi:10.1371/journal.pone.0012258] [@url:www.russet.org.uk/blog/2010/07/realism-and-science/] [@url:www.russet.org.uk/blog/2010/09/the-status-quo-farewell-tour-on-realism/] and I was interested in what changes have been made for BFO 2.0.

Unfortunately, it is not that easy to work out. While diffs have never been the most human readable of output, the OBI diffs raise this to a new level Consider this change:

svn diff -r 3424:3425 https://obi.svn.sourceforge.net/svnroot/obi/trunk/src/ontology/branches/obi.owl

@@ -204,7 +197,7 @@
     <owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/OBI_0000107">
         <rdfs:label>provides_service_consumer_with</rdfs:label>
         <rdfs:domain rdf:resource="http://purl.obolibrary.org/obo/OBI_0001173"/>
-        <rdfs:subPropertyOf rdf:resource="http://www.obofoundry.org/ro/ro.owl#has_part"/>
+        <rdfs:subPropertyOf rdf:resource="http://purl.obolibrary.org/obo/BFO_0000051"/>
     </owl:ObjectProperty>

Also available here for those without access to a local subversion. The resource previously known as has_part has become the rather more obscure BFO_OOOOO51. In short, BFO has become semantics-free.

In general, I think that this is a good thing. The use of semantics in the identifiers for this blog is generally not helpful, although I have never carried through my year-old threat [@url:www.russet.org.uk/blog/2011/05/permalink-semantics/] to change the identifier scheme as I am not sure older links will be maintained. But the total unreadability of the OBI diff demonstrates a problem. One answer is that we should not be reading OWL source in the first place, but using tools. These tools exist [@url:www.ebi.ac.uk/efo/bubastis/] in fact, but they are not a replacement for a diff, but a supplement to it. Source code must be in a readable syntax because line-orientated syntax is the lowest common denominator; semantic diffs are nice, but next we would need an OWL aware versioning tool, as versioning depends on diffing. Then OWL aware regexp search and replace tools for when syntactic alterations were needed. Eventually, we would end up replacing an entire software stack and, no doubt, doing it badly, since tools such as versioning software have a long heritage and are now very functional (and incredibly complex!).

My previous, minimal suggestion was to use a denormalisation, by adding a new comment character. So

ObjectProperty http://purl.obolibrary.org/obo/BFO_0000051

would become

ObjectProperty http://purl.obolibrary.org/obo/BFO_0000051[has_part]

The denormalisation here --- presenting the same information as an opaque string and as a text string, fulfils both requirements. However it would require significant effort to keep the two in sync.

My new idea would be to use a similar idea to a Colour Lookup Table [@url:en.wikipedia.org/wiki/Colour_look-up_table] These are used to define a palette of colours selected from a much larger colour space. We could use a similar approach here. Essentially the idea is to put semantics free IDs at the top of the file, then meaningful ones in the middle. The idea is also similar to the use of abbreviations for namespaces in XML; for instance,

<owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/OBI_0000107">

the rdf: prefix actually refers to "http://www.w3.org/1999/02/22-rdf-syntax-ns#". The letters rdf could be replaced by anything at all, so long as we update the namespace declaration without changing semantics.

In Manchester syntax, we could address this with an addition of an alias keyword. So:

ObjectProperty http://purl.obolibrary.org/obo/OBI_0000107
   Annotations: rdfs:label="provides_service_consumer_with"
   Domain: http://purl.obolibrary.org/obo/OBI_0001173
   SubPropertyOf: http://purl.obolibrary.org/obo/BFO_0000051

would become

Prefix: obo: http://purl.obolibrary.org/obo/
Alias: obo:OBI_0000107 "provides_service_consumer_with"
Alias: obo:OBI_0001173 "service"
Alias: obo:BFO_0000051 "has_part"


ObjectProperty provides_service_consumer_with
   Annotations: rdfs:label="provides_service_consumer_with"
   Domain: service
   SubPropertyOf: has_part

In this case, because we are defining a term and attaching a label we get the same string twice, but there is no formal link between the two. With this system in place, moving the identifiers for BFO would have required an update to only the Alias table at the top. Now an obvious place for the strings to come from would be the source ontology (so "has_part" would come from RO [@doi:10.1186/gb-2005-6-5-r46] or now BFO); this would, in fact, serve as a useful check. If I reference an external ontology and it's labels do not match with my Alias definitions, I may wish to check to see whether the concepts I have imported still have the semantics that I intended.

The same approach could be directly translated into the XML representation without change, I believe, with the use of XML entities which are defined at the start of an XML document. Of course, this is entirely horrible, and changing the OWL schema would make more sense. Extending Manchester syntax is straight-forward as I think I have shown here. Likewise, for OBO format. And the practical upshot would be a significant increase in the readability of many ontologies without eschewing the good practice of semantics free identifiers.