Following the publication of a number of papers, Gary Merrill, Michel Dumontier and Robert Hoehndorf (also as PDF) and myself (also on PLoS One), there has been an enormous amount of discussion on what is realism in ontology building, and whether it appropriate for use in scientific ontology building. As I have documented previously, I had now left the BFO discuss mailing list, and more latter OBO discuss, as I felt that these discussions have reached a finishing point. In this post, I want to spell out clearly my reasons why I think that it is not appropriate. I want to try and avoid re-iterating the positions in my paper, and earlier postings, as well as provide a direct answer to David Sutherland who has posted why he is a realist.


What is realism?

Sadly, I need to start with a philosophical digression. At heart, I am not interested in philosophy, nor I guess are many in the bio-ontologies community. Those in this camp can safely skip this to the next section.

At heart, realism is a metaphysical interpretation of the ontology. How are we to interpret the relationship between, for example, the ontology term Human, and the things that exist in the real world. Realism asserts that the ontology term refers to a Universal, that exists in its own right, but not separately from the instances to which it refers.

Personally, I do not find these assertions of reality or truth very helpful. David Sutherland suggests that:

One possible reason is a failure of nerve. Many people become quite nervous at talk of truth and reality.

— David Sutherland

In my case, this is true, and it stems from my history. Like many people learning science, when I first heard of Mendels laws, or the exceptional weird behaviour of light, my initial response was that they were not real, just part of the mathematical model that describes the experimental results. Later on, though, I realised that I had the same worries about other concepts. When I was first told that a table holding a weight was asserting a force on the weight, I didn’t believe it; after all, when I support a weight it costs me effort to do so, but the table was just sitting there. Many years before this, I didn’t believe the idea that I was surrounded by invisible things could gasses, although I did realise that it was a good way of explaining the wind. Eventually, however, I became so used to manipulating force, or a gene in mathematical equations of physics or genetics, I just stopped worrying about it.

In his paper, Gary Merrill argues that, in practice, we don’t need a metaphysical interpretation anyway. I tend to agree. Consider this quote:

The next question was – what makes planets go around the sun? At the time of Kepler some people answered this problem by saying that there were angels behind them beating their wings and pushing the planets around an orbit. As you will see, the answer is not very far from the truth. The only difference is that the angels sit in a different direction and their wings push inward.

Character of Physical Law
— Richard Feynman

Personally, I like to speak of models of data, rather than representations of reality. I find that talking of model reminds me that it is my job not to support models but to break them. I do not see the point of the rebadging of commonly used terms such as model, with more complex ones such as “representation of reality” (this rebadging is a theme of realism to which I will return). But the bottom line, though, it doesn’t really matter. The statement that \(g \propto 1/r^2\) is the same as \(F_{wings of angels} \propto 1/r^2\). So long as we agree that the angels behave in a precise, predictable way, there is no deep reason to distinguish between the two, except for simple pragmatism: “gravity” is shorter and easier to say than “the wings of angels”.


What realism is not

Realism has chosen wisely in its choice of name. Most scientists believe in reality so, when faced with realism vs conceptualism, their gut feeling is that the former will be right. They believe in a mind-independent reality so, therefore, conceptualism must be wrong. Now others have argued convincingly that this is an inaccurate interpretation of conceptualism, so I will not repeat the discussion here, but instead look at a more specific interpretation, that realism means building ontologies on the basis of experimental evidence. This conflation of “evidence-based ontologies” with realism can be seen from David Sutherland.

The results of those inferences will be judged by how they match reality. An inference that is demonstrably false indicates a problem with the initial assertions (or with the inference mechanism).

— David Sutherland

Similarly, Judy Blake makes the same conflation

I strongly support the realist approach that facilitates the use of the ontology for science discovery. We represent in the ontologies what we know with some degree of certainty.

— Judy Blake

Of course, both of these positions are reasonable — we should judge ontologies by how well their inferences fit our experimental data and, further, for reference ontologies, we should represent knowledge for which we have very good evidence. But this is not realism. This can be shown with a straight-forward argument.

While the definition of “science” is open to question, a reasonable working definition would be that “Science is the interpretation of experimental data”. The idea that anyone who is not a realist, therefore, believes that we should not base ontologies on our experimental data, or what we know is either uncharitable or wrong. It also, however, undermines the notion that realism is a useful methodology. If science is about modelling experimental data, while realism is a methodology for building ontologies based on experimental data, then “realism-based scientific ontology” is tautological; “realism-based” adds nothing at all to the statement, except to make it longer. In short, returning to the earlier theme, we have rebadged “scientific ontology” as “realism-based ontology”.

Believing in reality does not make you a realist. Believing that ontologies should be based on evidence does not make you a realist; it just means you are a scientist.


What is a pragmatic implications of realism?

One of the difficulties in addressing the pragmatic implications of realism, is that many of the conclusions that are made do not seem to stem from the underlying philosophy. This makes it hard to judge what the implications of realism are in a given situation. The end result has to be to look at how realism has been practiced in, er, reality, ignoring the philosophical underpinning. I’ve taken this approach here.

The first time that I heard to realism was at Glasgow ISMB in 2004. One theme that came out here was the notion that all ontologies should be single inheritance; because in reality things can only be a kind of one other thing. BFO follow this, and this position was supporting in many discussions on BFO-discuss. Ironically, though, with terms such as “Object Part” any ontology that uses BFO is hard pressed to do likewise. I was a little surprised to be asked if I understood the strategy of asserting single inheritance and inferring the rest; surprised because a) this strategy is normalisation pattern from Alan Rector with whom I have worked for many years and b) because it represents a complete change. Normalisation results in a poly hierarchy — that some subsumption is inferred and some asserted is a engineering decision, not a question of underlying philosophy.

That realism is, apparently, capable of supporting such a shift is rather worrying. This cannot be put down to falsifiability — the notion that ontologies can be wrong and can change — as this is a change at the metaphysical level. It suggests that, in practice, realism is disconnected from its philosophical underpinning. It also suggests that realism is capable of justifying two quite differing positions — in short, it suggests that realism actually has very little explanatory power. Currently, the realist answer to this, is that the asserted relationships represent universals; but as there is no clear assay for what this means, I feel this doesn’t help. My own experience is that determining a privileged axis of inheritance is not, in most cases, possible. Ontologies are fundamentally multiply-inherited; normalisation is a simple engineering decision, which removes the load of maintaining this from the human to the reasoner.

Another long held tenet of realism is the assertion that the use of not represents bad ontology. Statements such as Fly all has_part not Wing are asserting a relationship with entities that don’t exist. However, many people find this sort of modelling useful. It, therefore, was a surprise to find that following a lot of careful thought that realism does allow Fly lacks Wing is okay. But winding not into the relationship in this way has a number of problems. First, it requires an alteration at the logical level of the ontology; the relationship has to be between the instance and universal purely to satisfy realism, because the universal really exists. In doing so, a special case instance-universal relationship is required. Secondly, the semantics of this relationship are now hidden, rather than being explicit in the ontological layer; the reader has to understand that some relationships are effectively positive, and some are negative. It’s unclear why it is necessary to jump over these hurdles, when it would have been far simpler to just use a not construction.

So, does realism produce good ontology. I have already spoken about this in my paper, but it seems fairly clear that it does not. BFO, rather like myself as a youth, BFO is mass-centric: waves, energy, force, entropy all have no place. It makes unnecessary and meaningless distinctions — site and spatial region (one is a region of space wrt to an observer, one is an absolute region of space). It also encompasses some outright howlers, including the fact that a spatial region cannot have a length.

Does realism encourage good practice? Again, I think in many cases, it does not. Firstly, it elevates “reality” above all else; so any distinction that can be made should be made, because that is reality right? Taken to the extreme this results in overly complex ontologies, suffering from analysis paralysis. Just because we can make a distinction does not mean that we should, unless there is a good use case, and a clear reason why this distinction adds usefulness to the ontology. It also results in the use of overly complex, philosophical language, which is hard for those outside a small clique to understand; I do, now, understand the definitions in BFO, but in many ways I wish that I didn’t. As a trivial example, the modification of the standard definition from (“A is a B that has R”) to enable the distinction between defined and primitive classes (“A =def B that has R”). This reduces the readability of definitions. Readability is important; we should be willing to compromise precision in its favour.

Likewise, I worry when I see definitions such as

Class: planned_process

SubClassOf:
       realizes some (is_concretization_of some ('plan specification'
            and has_part some 'objective specification'))

or even

Class: glucose_tolerance_test

SubClassOf:
         assay,
         has_specified_output some ('information content entity'
                              and is_proxy_for some 'insulin resistance')
         realizes some (is_concretization_of some 'independent variable
                      specification')
         realizes some (is_concretization_of some 'time series design')
         achieves_planned_objective some 'biological feature identification
                       objective'
         achieves_planned_objective some 'assay objective'
         has_part some ('data transformation'
                        and has_specified_input some 'measurement datum'
                        and has_specified_output some graph)
         has_part some ('administering substance in vivo'
                        and has_specified_input some glucose)

The distinctions being made here, and the properties realizes and is_concretization_of stem from realism, and more specifically from the generically dependent continuant. With its mass-centric bias, BFO 1.0 couldn’t represent many entities, such as information, a book or this blog post. So GDC was added. But a dependent continuant is a thing that exists dependent on another, that comes into existent with the other, and disappears again. GDC shares none of these characteristics. A book does not appear when it is first printed, nor does it disappear when the paper breaks down, or the ink fades. But it was not possible to add something like immaterial continuant, because it had to depend on some mass. The convoluted nature of the ontology here exists to satisfy the requirements of realism; not the ontologists, developers or users.


The alternative

So, what alternatives are there. I offer no alternative metaphysics, because, as described earlier, I neither care, nor do I feel a metaphysical interpretation is necessary. We are building ontologies in biomedicine for many reasons — but mostly they revolve around one thing — we need a structure to hold our knowledge, our theories and hypothesis which is computationally amenable because there are too many to do by hand. It’s an engineering task and this is what I care about.

Ontology building, I would argue, is a hybrid, sitting somewhere between software engineering and statistical modelling. We need to borrow from the best of these worlds, to produce a good engineering methodology.

Actually, we already have borrowed from software engineering; OBO, for example, advises mailing-lists, trackers, version control, releasing early and often, tight user feedback. All of these stem directly from the agile techniques that have come to the fore in the last decade; all of these have been part of ontology building since well before realism appeared on the scene.

I think we need to take more account of use cases, or their light-weight manifestations, with “user stories”. Realism, and the philosophical reflection that it inspires, to me seems to bear more in common with the waterfall methodologies of an earlier era; thinking carefully earlier to avoid having to fix things later sounds a good idea, but history suggests that in many cases, the thinking simply delays the point at which you discover you have to fix things anyway.

But agile software methodologies do not have all the answers; ontologies are not software. The key difference is that ontologies lack test frameworks. While it is sometimes possible to automatically test our ontology against the experimental data, in most cases it is not. I think this is where we need to borrow more from statistics. For instance, one rule that from statistical modelling is: do not add a new variable to a model, even if it increases the goodness of fit to the data, unless the increase is statistically significant. In ontological terms, this can be translated: just because you can make a distinction does not mean you should.

In his 2005 paper, Ingvar Johansan talks about the fallacy of mixing use and mention. As example he presents this (now changed) section of GO:

Gene_Ontology
  part_of
Biological process
  is_a
physiological process

The problem with this is that “biological process” is overloaded, referring both a biological process and the ontology term biological process. The link was originally put in place for engineering reasons; I used it, for instance, in the work for my own paper from 2002. I knew that semantic similarity (how closely annotated two genes are) correlates with sequence similarity; the question is does this work better, if we consider all of GO, or the three aspects independently. The answer is the latter; in short, Biological Process part_of Gene Ontology has no explanatory power. So, is this an example of realism demonstrating an ontological problem; sadly not. Consider this, slightly changed ontology:

Universe
  part_of
Biological process
  is_a
physiological process

According to realism, this simple rebadging of the top-level term has fixed the problem, because all biological processes really are part of the universe. But computationally, we have the same ontology, so we still have a term with no explanatory power. In short, the uses and the use cases of our ontology define the best ontology; the experimental data is only a start.


Conclusion

I tend to agree with Nicolas le Novere that this:

is an endless discussion because this is specifically the fundamental divergence between two schools of thoughts, both respectable, and both consistent, but irreconcilable.

— Nicolas Le Novere

I have written this post both as an answer to David Sutherland, as a supplement to my paper, but most importantly as a way to remove myself from the discussion. I think, now, that with three papers on the issue, I can move on with what I want to do: use ontologies to help with the analysis of our data, and to increase our understanding of biology.

I do not expect that the significant momentum that realism has built up will be broken, but I do hope that it will cease to be advanced as proven best practice, to be considered the only correct way forward. If this has been achieved, then it will help to avoid the unfortunate situation that some actually want; a fork in the community. I think that this is a pity; in general, I tend to prefer OBO’s stated principle that “we would strive for community acceptance […] rather than encouraging rivalry”.

There are so many agreements between the various sides of this argument: it is on these, the practical, pragmatic engineering decisions that we see in much of OBO and GO, and that we see in the original ten principles of OBO that we should build.

9 Comments

  1. Chris Mungall says:

    “Another long held tenet of realism is the assertion that the use of ‘not’ represents bad ontology. Statements such as Fly all has_part not Wing are asserting a relationship with entities that don’t exist.”

    I wasn’t aware of the realist objection to the not / complementOf construct.

    I was aware of the objection to a class “absent wings” defined as wings that are not there.

    So there would be an objection to:

    Individual: fly1
    Types: has_part some ‘absent wing’

    Which seams reasonable since I don’t know how to define ‘absent wing’ in OWL in a way that would give me the correct inferences (note: using complement of here doesn’t give the right answers, as all you are saying is that the fly has some part which is not a wing).

    I’m not aware of any objection to:

    Individual: fly1
    Types: has_part exactly 0 wing

    or:

    Individual: fly1
    Types: has_part only not wing

    (the lacks_part instance-class relation would just be a macro that maps to this construct).

    I think this second construct is what you are getting at with “Fly all has_part not Wing”

    As it turns out we can dispense with realism entirely for this discussion. If we focus on modeling in a way that gives us useful answers then the realist objection is consistent with but superfluous with a pragmatic modeling approach.

  2. Phil Lord says:

    Chris, I’ve replied in a seperate post!

  3. Chris Mungall says:

    I also worry when I see some of the OBI definitions. I pointed out that these were overly complex in my review/presentation on OBI at the OBO Foundry meeting in Cambridge 2 years ago.

    However, it’s not clear to me how much of the blame for the complexity in OBI can be blamed *directly* on realism. Of course, realism can’t be let off the hook entirely as BFO is our main representative of a realist upper ontology and OBI is probably the paradigmatic example of an ontology that was constructed using BFO from the ground-up. But I think it’s worth analyzing in detail to see if there is a way to reduce the complexity whilst still being faithful to BFO and/or realism – and if not, which parts of BFO/realism are good targets for reform.

    Some of the complexity could be mitigated by judiciously naming classes rather than relying on deeply nested class expressions. Although this is sweeping things under the rug a little.

    I think some of the complexity may be down to BFO specifically rather than realism.

    As someone who initially pushed for GDCs in BFO, I was dismayed when it transpired that there was to be no direct primitive relation connecting objects like physical books and the information contents of that book. Instead we have to say ‘book content’ has_concretization of some (inheres_in some book). This gives me a headache and seems to just be making busy work for no practical reason. Also I feel lost with respect to what the intermediate unnamed entity is here.

    For the SO/SOM links we will probably directly connect molecules to sequences (GDCs). This direct relation could be treated as a macro relation (e.g has_sequence) and expanded to a concretization o inheres_in chain. It’s unclear if expanding this macro gives you any additional explanatory power or is any way useful in reasoning. It’s also not clear to me that our direct relation is any “less real” than the expanded form. From a pragmatic point of view, the direct has_sequence relation is the one I would write axioms about. We hope that this relation will also be reasonable intuitive for users.

    The macro approach could be useful for simplifying OBI. E.g. has_specified_output o is_proxy_for –> measures

  4. Matthias Samwald says:

    Some very good points.
    One thought I had: You seem to equate realism with using BFO as the foundational ontology. This is obviously not a necessity, and I guess other philosophers and schools of thought could come up with a realist foundational ontology that would be structured very differently.

    – Matthias

  5. Phil Lord says:

    “However, it’s not clear to me how much of the blame for the
    complexity in OBI can be blamed *directly* on realism.”

    “I think some of the complexity may be down to BFO specifically rather than
    realism.”

    Oh, I agree. But, as discussed at length, “realism” doesn’t have a useful
    definition. So I have to judge it as I see it’s use. The flip side, of course,
    is that I also find the “realism is good because OBO is good” argument poor.
    Much of the success of OBO comes, I think, from the original ten principles,
    none of which stem, I think, from realism.

    “As someone who initially pushed for GDCs in BFO”

    The requirement, the necessity for something like GDC is clear. The
    implementation is broken. Book content is something that comes into existence
    at some point in time, and then exists. It’s just a continuant; I think we
    need to accept that just because a continuant does not have mass, this doesn’t
    mean it’s dependent. At the moment, I find it strange that the content of this
    post cannot bear properties, although it appears to me to have an author,
    length, language and so on.

    I also throughly agree with your point “has_concretization of some
    (inheres_in)”, rather like “concretization of some (realization of)”. I see
    cost here, not value.

    “It’s unclear if expanding this macro gives you any additional explanatory
    power or is any way useful in reasoning.”

    Macros I think can be useful; for example, I would rather implement n-ary
    relations, and value partitions in this way, rather than having to put the
    bits in my hand myself, so in general, I think your approach is sensible. But
    if the expansion truly adds no explanatory power, it’s papering over the
    cracks.

  6. Phil Lord says:

    Matthias, you are largely correct. My paper is more explicit about this: ‘when we say “realism”, we largely mean “realism as practiced by BFO”. We do not claim to address all the philosophical perspectives [called] “realism”‘

    Both my paper and this post are a bit wider than just using BFO, and stem from the set of general principles that are coming from the developers of BFO. For instance, use of overly complex language, the broken process model and the confused ideas about boundaries are not explicitly part of BFO.

  7. Yes, really. | OntoGeek says:

    […] on September 27, 2010 by dosumis| Leave a comment This post is written largely as a response to Phil Lord’s thoughtful response to my previous post on realism and to a marathon pair of threads on the OBO-discuss e-mail list […]

  8. David OS says:

    Hi Phil,

    Finally got round to posting a response to you on my blog:

    http://ontogeek.wordpress.com/2010/09/27/yes-really/

    Cheers,

    David

  9. Ontological realism, methodologies, and mud slinging: a few notes on the AO trilogy « Keet blog says:

    […] can be read over at Phil Lord’s blog (The Status quo farewell tour on realism, Why not?, and Why realism is wrong) and his paper together with Robert Stevens at PLoS ONE [9], versus David Sutherland’s Realism, […]

Leave a Reply