Archive for the ‘Ontology’ Category


Abstract

The process of building ontologies is a difficult task that involves collaboration between ontology developers and domain experts and requires an ongoing interaction between them. This collaboration is made more difficult, because they tend to use different tool sets, which can hamper this interaction. In this paper, we propose to decrease this distance between domain experts and ontology developers by creating more readable forms of ontologies, and further to enable editing in normal office environments. Building on a programmatic ontology development environment, such as Tawny-OWL, we are now able to generate these readable/editable from the raw ontological source and its embedded comments. We have this translation to HTML for reading; this environment provides rich hyperlinking as well as active features such as hiding the source code in favour of comments. We are now working on translation to a Word document that also enables editing. Taken together this should provide a significant new route for collaboration between the ontologist and domain specialist.

  • Aisha Blfgeh
  • Phillip Lord


Plain English Summary

Ontologies are a mechanism for organising data, so that it can be generated, searched and retrieved accurately. They do this by building a computational model of an area of knowledge or domain.

But, building ontologies is a challenge for a number of reasons. One of the main problems is that building an ontology requires two skills sets: the use and manipulation of a complex formalism, which tends to be the job of an ontologist; and, the deep understanding of the area that it being modelled, which is the job of a domain specialist. It is fairly rare to find a person who understands both areas; so people have to collaborate.

In this paper, we describe new mechanism to enable this collaboration; rather than trying to train domain specialists to build ontologies or use ontology tooling, we instead manipulate an ontology so that it can be viewed as an office doc, which ultimately is the tool that most people are familiar with.

A chicken is an eggs way of making another egg

One of the joys of Ontology building is that you can end up in some fairly obscure arguments; the one I got in today is whether a sperm is a human being. Of course, this is silly, but mostly because of the limitation of our language. I would like to describe here why a sperm is a human individual and why it is important.

One of the long running discussions in the ontology community is how we define function. With respect to biological organisms and biological function this is particularly challenging; in fact, biology continually raises questions and exceptions which is part of the fun.

I added my contribution to definitions of function several years ago (1309.5984), built largely around evolution and, more importantly, homology.

One of the issues with other definitions available at the time, and specifically, BFO is that it used a definition as follows:

A biological function is a function which inheres in an independent continuant that is i) part of an organism[…]

The point, here, is that by definition an organism cannot have a function because an organism cannot be part of an organism. This works well for people, but badly for some organisms, especially eusocial ones like ants which appear to have functions in their society which they have evolved to fulfil. My argument here is that it also means that a sperm cannot have an function, because, actually, a sperm is an organism. Of course, this seems daft; a sperm is, surely, part of an organism in the same way that a blood cell is. However, this is not true.

All organisms have a genome — their genetic material. Many organisms have a single copy of their genome; for single celled organisms, this gets doubled before they divide, so actually they have two copies of their genome much of the time, but these two copies are identical. These organisms are called haploid.

However, as you might expect sexual reproduction makes things more complex. This involves taking two previously independent organisms, merging them, then dividing again. The merged organism has, now, two different copies of genome; these are called diploid.

Once this happens, the life cycles of an organism gets more complex. Some organisms, such as the yeast (Schizosaccharomyes pombe) really dislike being diploid. It’s possible to maintain them in the lab, but generally given the choice they sporulate and become haploid again. However, others such as brewers yeast (Saccharomyces cerevisiae) behave differently. It grows, develops and lives as in the haploid form; but it also does this in the diploid form and is quite happy.

Many plants do this also and exist in both a multicellular haploid form (called the gametophyte) and a multicellular diploid form called the sporophyte. In the flowering plants, the gametophyte is very small, and exists entirely within the sporophyte stage; in other plants, the gametophyte is larger. But both forms can grow and develop, a process called alternation of generations.

As far as I know, no animals do this. However, there are quite a few organisms where both a haploid and a diploid form exists; the male ant that I refered to earlier will be a haploid, while the females are diploid. This doesn’t disadvantage the male — it simple produces sperm which are genetic clones of itself.

In humans, like the flowering plants, the diploid form is dominant. There are two haploid forms, the egg and sperm, both single cells; the female form exists entirely within the diploid from which it arose; the sperm can travel a bit further but not much.

Of course, in most practical circumstances, the sperm would appear to be a part of the man that produced them; if I was building a medical ontology, I would make this statement, because it would fulfil everyones intuition, common medical and legal practice.

But, there is no real justification for this. It exists, it is independent from that man, has a different genome from that man; it is an organism in the same way that a gametophyte or a male ant is an independent organism. For a biological ontology, working cross-species, we have no basis for making this distinction; if the sperm has a function of fertilizing an egg, then the man has the function of producing more sperm. Alternatively, if a man cannot have a function, neither can a sperm.

Does this mean that sperm is a human being? Obviously this would be silly, nor is a sperm a person; but it is an organism and it is human. We just lack a word to describe this.

This discussion came up at ICBO 2017, following a discussion with Barry Smith.

Bibliography

Over the years, a great deal has been written about the ontology of pizza (http://robertdavidstevens.wordpress.com/2010/01/22/why-the-pizza-ontology-tutorial/). It’s a good example, is easy to understand and works surprisingly well in a tutorial context. It is also comes up surprisingly commonly in the public sphere as it did last year on BBC News (http://www.bbc.co.uk/news/magazine-33542392). The key point of which is this: the pizza maker argues that you can’t have a marinara (tomato and garlic) with added mozzarella because a marinara is pizza rossa which can’t have mozzarella; a margherita (tomato and mozzarella) with garlic is fine though. Ha, those crazy Italians. I paraphrase, of course.

Of course, the right to comment first on this article rests with Robert Stevens, my colleague and world acknowledged authority on the ontology of pizza, and indeed he has done so (http://robertdavidstevens.wordpress.com/2015/08/13/where-a-pizza-ontology-may-help/).

I would like to take a slightly different approach to the question though. How do we know what our ontology should be? I’ll start by ontological reading of the article, look at some issues with it, and then consider how we might gather the knowledge to resolve them. As with many parts of ontology construction, there is not a perfect answer and it turns out to be more complex than it might appear at first sight.


A Literate Approach

I start by building an ontology. As (both) regular readers of this journal might expect, I am going to use Tawny-OWL (http://www.russet.org.uk/blog/3088) to do this, so the syntax is a little different than for Robert’s post.

I’m starting from a slightly different place from Robert. One of his concerns was to fit within the context of the existing pizza ontology. Now, I use to be a firm believer in the idea that ontologies were all about a shared conceptualisation of the world, but over time, I have become less sure that this should always be a consideration. In this case, the purpose of building the ontology is to allow me to formally and explicitly describe something with the context of this blogpost, to clarify my own understanding. Sharing is irrelevant for this use case and I am going to build my ontology rapidly from scratch.

I am a big fan of literate ontologies (1512.04250) and I want do that with blogs as well. So the source code of this post (accessible from http://archive.org/download/phil-lord-journal/), is lenticular text (http://www.russet.org.uk/blog/3035) and the whole article can be parsed as a valid ontology. Perhaps no change for the reader, but a comfort for me as the author to know that I’ve evaluated every statement in this post.

(ns the-epistemology-of-pizza
  (:refer-clojure :only [])
  (:use [tawny owl english reasoner]))

(reasoner-factory :hermit)

As with all Tawny-OWL ontologies, we start with a namespace declaration. If you are reading this and use Clojure a lot, note the :refer-clojure and use of tawny.english: none of or, some or not are the clojure.core functions! We select a default reasoner also.

(defontology epistemology)

(defclass PizzaTopping)
(defoproperty hasTopping)

(as-subclasses
 PizzaTopping
 :disjoint
 (defclass Mozzarella)
 (defclass Tomato)
 (defclass Garlic))

We need a set of primitive terms on which to base the ontology, which we define here. Again, this is not an ontology for sharing: I have not added labels, formal textual definitions nor do I need to care about what the IRIs actual are. We only need three toppings and all we need to care about them is that they are different.

(defclass Pizza
  :super (some hasTopping (or Mozzarella Tomato)))

It turns out, I do not need to care about pizza bases either, so I am not going to talk about them explicitly. Rather unusually (and in a difference from the pizza ontology) I am going to insist that a Pizza have either Mozzarella or Tomato. I’ll come back to this later.

(defclass PizzaRossa
  :super
  Pizza
  (only hasTopping (not Mozzarella)))

(defclass PizzaBianca
  :super
  Pizza
  (only hasTopping (not Tomato)))

The definitions of PizzaRossa and PizzaBianca are a little surprising; they are defined negatively, but I quite like these definitions. They are sort of like the backward definitions of the fly geneticist — the “white” gene is responsible for the enzyme that causes red eyes. The gene is named after the mutant.

(defclass Marinara
  :super
  PizzaRossa
  (some hasTopping Garlic Tomato))

The definition of Marinara is now straight-forward enough, simply stating the ingredients, and that Marinara is a PizzaRossa.

(clojure.core/assert
 (with-probe-entities
   [WierdMarinara
    (owl-class "WierdMarinara"
               :super Marinara
               (some hasTopping Mozzarella))]
   (clojure.core/not (coherent?))))

And, so we get to the humourous crux of the story which is, indeed, you cannot have a Marinara with Mozzarella.


The Epistemological Questions

But this leaves us with a number of problems. One of which is that in explaining the joke, we have rather killed it; one sad reality that all ontologists have to face is that it pushes us toward pedantry, making us humourless, crushing bores. A much deeper problem though is that what we have produced is a computational data structure which we have queried and got an answer about that computational data structure. When what we really want to know about is pizza.

How do we know whether what we have modelled is correct. Is this ontology a good ontology, an accurate reflection of reality? And what does this mean anyway? In short, how do we know what we know?

Let’s consider the issue from a set of different perspectives.

The Software Engineer

As a software engineer, I’m rather fond of the ontology that I have produced, and in that sense it is a good ontology. I’ve already said that I quite like the “backward definitions”, and find them quite elegant. The ontology is also symmetrical: PizzaBianca and PizzaRossa are both defined in the same, quite equivalent way.

Now, of course, we might argue that this considerations do not tell us that we have a good ontology. But an elegant and symmetrical axiomatisation is useful; it’s easier to remember, there are no “special cases”, and it is easy to spot outliers. All of these things support maintainability of software in general and specifically in ontologies.

There are some issues with my ontology; we have not, for instance, following Alan Rector’s normalisation pattern (http://ontogenesis.knowledgeblog.org/49); the Margherita is neither white nor red in this schema.

Margherita presents a bigger problem, though, than not being normalised, which is simply that my knowledge of margherita tells me that it is normally considered a pizza rossa, while my ontology say it is not. My ontology is nice, but it is wrong.

The philosophical approach

We could also consider this ontology from a more philosophical point-of-view. Of course, I am perhaps not the ideal person to do this, but I would note that our ontology has a clear, single inheritance hierarchy. Most of our classes have clear differentiatia (I’ve just stopped at the toppings, but you can’t define everything, because that turtles (http://en.wikipedia.org/wiki/Turtles_all_the_way_down)). Even our definition of pizza could fit into a larger hierarchy: pizza without either mozzarella or tomato is either a focaccia or some other type of bread.

After that, I am rather stuck. It is hard to draw many more conclusions about pizza from first principles. From a realist point-of-view, we should model reality and universals, which sounds nice. But how do I determine that what that reality is?

Time to phone a friend.

The Expert Analysis Technique

One standard technique in ontology building is to consult with an expert. Indeed, that is often the main evidence and justification that is used to support an ontology, which is why many ontology papers have more authors than the human genome paper. So, let’s try it in this case. The BBC article quotes from quite a few experts.

“La marinara is a pizza rossa,” she states frostily. “A pizza rossa is made with tomato and without mozzarella. So you can’t have a marinara with mozzarella because there’s no such thing.”

Emanuela
— BBC

My ontology supports this because Marinara is a PizzaRossa so, indeed, cannot have Mozzarella.

“No, it’s not,” pipes up a customer who until now has been quietly consuming his pizza and beer on a stool behind me. “She’s right. A pizza rossa can’t have mozzarella.”

Customer
— BBC

Also!

The pizzaiola is right. A marinara is not a marinara if you add mozzarella. But she was wrong to say she would make you a margherita with garlic because margherita with garlic doesn’t exist.”

Friend
— BBC

Currently, my ontology says nothing about this one way or the other. But, we can add a closed definition for Margherita easily, and now, indeed, a margherita with garlic cannot exist.

(defclass Margherita
  :super Pizza
  (some-only hasTopping Mozzarella Tomato))

(clojure.core/assert
 (with-probe-entities
   [WierdMargherita
    (owl-class "WeirdMargherita"
               :super Margherita
               (some hasTopping Garlic))]
   (clojure.core/not (coherent?))))

I also tried extending my analysis further, with a novel technique; I tried it by asking my friends on facebook. One of them, replied as follows.

my take on this is that we must distinguish between pizzas made by bakeries (pizza bianca and rossa) which tipically (but theres no firm rules) do not have mozzarella, and pizza from a pizzeria (restaurant) which almost always has mozzarella (bianca and rossa). a pizza bianca without mozzarella is a focaccia ( with rosemary, onions, potatoes, etc)

— Ulisse Pizzi

It’s a disaster! Almost, none of this has been modelled in my ontology, although it supports my assertion that a pizza must have tomato or mozzarella. But worse, “there are no firm rules”; it’s enough to make a grown ontologist cry. But there are some deeper frustrations here. None of the experts have told me all the issues that I want to know. None have really answered what is a pizza.

We could ask some more experts? But what happens when they contradict? Do we take averages? And, more important, how do I know when to stop; I could carry on asking friends all day. And how do we avoid cherry-(tomato)-picking?

From the Definitive sauce (erm, source)

Having tried ontology by facebook, let’s try a research method with a much older and richer pedigree; we will look the answer up on wikipedia instead. First port of call is the main Pizza, which says many things, but none of them that useful in this case. Hunting around got me to Pizza al taglio (pizza by the slice) which says:

The simplest varieties include pizza Margherita (tomato sauce and cheese), pizza bianca (olive oil, rosemary and garlic),[4] and pizza rossa (tomato sauce only).

— Pizza al taglio

This is interesting becuase it brings in garlic and rosemary (which has not been mentioned before). In this version of the world, a margherita is not a pizza bianca. And there is a lack of symmetry between rossa (which is closed) and bianca (which is not).

Perhaps a better source of material would be to look on the Italian wikipedia for its definition of Pizza. This says:

Pizza marchigiana […] Le varianti tradizionali sono quattro: bianca[12] con il rosmarino, bianca[12] alla cipolla, rossa semplice[13] e rossa[13] con la mozzarella.

Pizza marchigiana
— Italian Wikipedia

So, more types of pizza, including with rosemary, with onion and simple. The footnotes are more informative still.

[12] Per “bianca” si intende una pizza senza pomodoro. [13] Per “rossa” si intende una pizza con il pomodoro.

Pizza
— Italian Wikipedia

So, pizza bianca must NOT have tomato, while rossa MUST have it. The definition of pizza rossa here is inconsistent with mine, but makes more sense to be honest — margherita finally becomes a pizza rossa. Strictly, this quote only really applies to “pizza marchigiana” — once we move out of Marche, all our definitions might be different!

Another useful thing that does come out of reading wikipedia, however, is the information that Pizza Neopolitana has regionally protected status in the EU. And that there is an official body for defining what is a pizza, namely the AVPN. Their document describing the pizza is called the disciplinare (also in English).

And, indeed it has definitions, although unfortunately only of two pizza, the Marinara and the Margherita. So, the answers we get from here are limited; but they look like this:

Marinara (tomato, oil, oregano, and garlic) Margherita (tomato, oil, mozzarella or fior di latte, grated cheese and basil)

Disciplinare
— Associazione Verace Pizza Napoletana

So, we have our answer? Well, it’s one I find surprising (grated hard cheese on margherita!). But, actually, the disciplinare is more specific. For instance, where they say tomato not just any tomato will do, they have to be one from a very specific type:

The following variations of fresh tomatoes can be used: “S.Marzano dell’Agro Sarnese-nocerino D.O.P”., “Pomodorini di Corbara (Corbarino)”, “Pomodorino del piennolo del Vesuvio” D.O.P.” (see attached appendices for suppliers and technical details)

Disciplinare
— Associazione Verace Pizza Napoletana

How can we encode this? It really just too complicated, and is unlikely to be useful ontologically at any point; worse, we only have answers for two pizza types and there many others. Looking further, there are also instructions for the flour, the proving, the stretching and much more besides. So, it turns out that the definitive answer is not that useful either: it is incomplete and overly complicated.


Conclusions

The BBC articles take on the whole process is this:

Pizza has taught me that logic can be subjective and that subjective logic can be cultural

— BBC

Of course, it’s not true; logic is not subjective at all, although there are many different forms of logic. But definitions can be. I have tried multiple different mechanisms of reaching a definition, and all of them have flaws:

  • software engineering approach — maintainable, nice but correct?
  • philosophical approach — limited in its applicability
  • find a friend (the facebook approach) — prone to cherry picking
  • literature review (the wikipedia approach) — lacks interactivity, so may not answre the question
  • A definitive source (the authoritarian approach) — over specified, under covering

Definitions are difficult, and there will be no universal answer. Having a clearly defined use case, and a mechanism to test your ontology against that use case remains key. But also having a clear awareness of the techniques that you are using for build your ontology and the flaws that exist with them.

It is also a good excuse to eat pizza, if you need one.


Updates

Minor spelling corrections!

Bibliography

I’m winding by way back from a busy month with both Bio-Ontologies and ICBO, but in general I think the experience has been really positive, even if interspersing holiday and work travel has rather exhausted me. But both were in Europe and Bio-Ontologies was right next door, so I did not want to waste the opportunity.

I have a long history with Bio-Ontologies, having been a chair for many years and a informal helper before that. We steered it from an informal meeting, to having a proper programme committee, proceedings and much of the structure that it has now. I bumped into Steven Leard at the meeting, and was rather shocked to realise that the first meeting I helped out at was 14 years ago.

Strangely, though, since my last time as a chair, five or six years ago, I’ve never been once. For a few years, of course, this was quite deliberate; I was so fed up with travelling at that time of year, that I really enjoyed the rest. But since then, it has been happenstance, rather than a deliberate decision. So, it felt like a bit of a home-coming, and even if I have seen many of the people at different conferences on different occasions. Mark Musen gave a interesting keynote: I was, at the time, rather unconvinced by this hypothesis that we don’t spend enough time arguing (I mean, ontologists, really?). A more nuanced reading of what he said though, is that we should assess and re-assess our practices against the evidence of our experience. I cannot help but agree with this, and it has made me think again. More on that later, perhaps.

It was nice to go to Dublin, also, as it was my first time. Nice city, deeply integrated with it’s river. We had some nice feed, in some good resturants and cafes, and a blissful absence of Irish theme pubs. The conference venue was good also, even if it does look like a vacuum cleaner from outside.

ICBO was a different kettle of fish, though. At four days (many of the delegates go for the whole thing) it’s long, and I felt rather stretched by the end (I’m on the plane home now, after a very early start, which might be colouring my vision). This does give plenty of time for slightly longer and more detailed presentations; the workshops were small, intense and full of discussion. Likewise, the poster and demo sessions. I rather blitzed the conference with Tawny-OWL (http://www.russet.org.uk/blog/3030), and Lentic (http://www.russet.org.uk/blog/3035). In total, I gave 1 tutorial; 1 paper; 1 demo; 1 flash update on the demo and 1 feedback session on the tutorial. People seemed genuinely sympathetic and a little sad when my cute Tawny-OWL logo went 404 during the flash update. For those who missed it, the logo is online, as is the logo for lentic which lacks in cuteness, but is rather more dramatic.

I got some good feedback, was surprised to win the best demo session (I mean, it was entirely text running in Emacs, and very laggy, running on my 5 year-old netbook). The second place was James Overton’s Robot. I am told between the two of us we got a very large percentage of the vote. I think this is an interesting result, because it strongly suggests to me that, for ICBO attendees there is disatisfaction over current tooling. Ontologies are being more programmatically developed and I cannot help but feel that this is the future.

I thought I had never been to Lisbon before, but on getting there I realised I had been, about 20 years ago; the story of is long, and not that interesting so I will skip describing it here. This time I had a better look and I will not forget again. Lisbon is very nice city indeed; while it’s architectural elegance may not be quite up there with Rome (or even Milan), it’s certainly not far behind, but as a city built into and with its geography it is stunning.

In summary, an interesting month from an ontology perspective and one that I enjoyed very much. While I might have wanted for something a little less hectic (especially, as I interspersed my holidays (http://www.russet.org.uk/blog/3091)), it has left me with the sense that ontologies are both a productive part of the bioinformatics environment and a sense that there is more to come.

Bibliography


Abstract

Bio-medical ontologies can contain a large number of concepts. Often many of these concepts are very similar to each other, and similar or identical to concepts found in other bio-medical databases. This presents both a challenge and opportunity: maintaining many similar concepts is tedious and fastidious work, which could be substantially reduced if the data could be derived from pre-existing knowledge sources. In this paper, we describe how we have achieved this for an ontology of the mitochondria using our novel ontology development environment, the Tawny-OWL library.

  • Jennifer D. Warrender
  • Phillip Lord


Plain English Summary

Ontologies allow complex descriptions of the world in a way that is both precise and computationally amenable — that is, computers can be used to check and query these descriptions. The mitochondria is a critical part of the cells of most organisms, being responsible for energy usage. We wished to build an ontology describing the current research on the mitochondria.

The more traditional approach to this, would have been to build the ontology from scratch; but many parts of the mitochondria, including the genes and proteins have already been described in other databases. Building from scratch on the basis of the data in these databases would be time-consuming, but also sensitive to change — if the database changes, our ontology would need updating too.

Instead we have used our new ontology development methodology to automatically extract this knowledge, and build the ontology for us providing what we describe as the scaffold for an ontology. In future, we will add more knowledge to this ontology, slowing building up the rich description of the mitochondrion that we are aiming for.


Abstract

Ontology development relates to software development in that they both involve the production of formal computational knowledge. It is possible, therefore, that some of the techniques used in software engineering could also be used for ontologies; for example, in software engineering testing is a well-established process, and part of many different methodologies. The application of testing to ontologies, therefore, seems attractive. The Karyotype Ontology is developed using the novel Tawny-OWL library. This provides a fully programmatic environment for ontology development, which includes a complete test harness. In this paper, we describe how we have used this harness to build an extensive series of tests as well as used a commodity continuous integration system to link testing deeply into our development process; this environment, is applicable to any OWL ontology whether written using Tawny-OWL or not. Moreover, we present a novel analysis of our tests, introducing a new classification of what our different tests are. For each class of test, we describe why we use these tests, also by comparison to software tests. We believe that this systematic comparison between ontology and software development will help us move to a more agile form of ontology development.

  • Jennifer D. Warrender
  • Phillip Lord


Plain English Summary

Ontologies are a mechanism for representing parts of the world computationally. They allow you to describe the world in a complex way, and then query over it repeatable and consistently. However, ontologies are complex and are themselves hard to build consistently and repeatably. If the ontology is built incorrectly, then queries will give the wrong answers also.

Software is also complex and over the years, software engineers have developed many techniques for building software so that it, too, is correct. While these do not always succeed, they have allowed us to produce software that is vastly more complex than in years past. One important technique is automated testing. Here software can be run to ensure that it is behaving correctly automatically and often. To do this, we use one piece of software to test another.

We have borrowed the same technology for use with ontologies; while this has been done before, our use of commodity testing software has allowed us to scale up the tests significantly, and we describe this approach in this paper. However, while they have many similarities, ontologies are not software. The sort of tests that we need for ontologies may be different from those that we need for software. In this paper, we also describe the kinds of tests that we have used for the karyotype ontology (1305.3758), and which are probably relevant to other ontology development efforts too.

Overall, this should increase our understanding of how to build ontology tests and ontologies.

Bibliography