Archive for May, 2009

My next blog post was going to be about function, as I have just had a paper about it accepted. But, I got slightly side-tracked along the way, thinking about Literate Programming as it applies to OWL. While an ontology is (or, to my mind, should be) a computational artifact, it’s a bit different from a program; the main thing is that it doesn’t run; it doesn’t have that functional test that a program does. This is not to say that an ontology is not an application-dependent entity. It can be, but even then it needs to have a program built on it.

One of the upshots of this is that a narrative justification for an Ontology is fairly important; currently, we spend far too long on mailing lists, arguing about ontology terms and, to my mind, not enough of this is reflected in the final outcome. If, on the other hand, we moved to a situation that adding a new concept was equivalent to writing a paper, we might have less of this. Discussion would be a bit more focussed; besides which, most scientists are experienced with writing and reviewing papers, so we’d just be better at it.

For this to happen productively, though, the paper has to become, itself, a computational artifact. It’s not good having documentation that has to be kept in-sync with the ontology; we will just end up with multiple versions, and will never quite know what we are talking about; my discussions about BFO have shown me this; do we mean the OWL, the definitions in the OWL, the papers or what? We should be able to generate both readable documentation and computational OWL at the same time. In short, literate programming.

Now, I know that Bijan Parsia has been investigating this also, but I wanted to think a little bit about how it would fit into my environment.

One thought was to get the system working within asciidoc which I am using to generate these pages. This turned out to be simple enough; take, for instance, this definition for BiologicalFunction.


Class: BiologicalFunction
    Annotations:
    rdfs:comment
"Definition: A biological function is a realizable entity that inheres in continuant
which is realized in an activity, and where the homologous structure(s) of
individuals of closely related species (or identical species) fulfil this
same biological function.",

    SubClassOf:
        Function

Asciidoc uses source-highlight for it’s syntax highlighting. I had to add a bit of config (which, annoyingly, needs to be placed into main install directory for source-highlight, rather than in a user space dot-directory.

Unfortunately, this is not going to be as good as you might hope for printed documentation. The obvious solution here is to aim at LaTeX. I think that I am going to have a quick go at producing something like this, inspired by Literate Haskell. Basically, I need three tags which look like this:



\begin{owl}
Class: Thing
\end{owl}

\ignore{
\begin{owl}
Class: BoringOWL
\end{owl}
}

\begin{notowl}
Clazz: BrokenOwl
\end{notowl}

The first copes with OWL that should appear both in the documentation and code (that is most of it). The second covers OWL that should appear just in the code; the haskell example is for a “help” function; I suspect that this is rarely needed for OWL. The final example appears just in documentation; it would be useful for anti-examples (“Don’t do this!!!”). My plan would be to pre-process the latex just using regexps, nothing complex, to dump the OWL to a file, mostly because I don’t know how to get latex to do it. Meanwhile, these two macros would be just be defined in terms of the Listings package (which means writing yet another syntax highlighting set of regexps, oh dear).

Well, this is okay, but has two problems: first, it means writing OWL inside latex which means that editor support is going to be rubbish; second, what if I want to blog AND print a document. My solution to this is to move my ontologies to being multi-file based. As far as I can tell, Manchester OWL is order independent (except for the header). So the plan would be to write multiple files, each with a few Concepts in:

 function/header.omn
 function/function.omn
 function/biological_function.omn
 function/artifactual_function.omn

Generating a complete Manchester syntax file from this would be easy (more or less, just run cat). This could be supported within latex by adding some include macros. Again, this is trivial to do with listings package.


\owl{function.omn}
\ignore{\owl{help.omn}}
\noowl{broken.omn}

Likewise, asciidoc supports it using include macros. I shall give this a go next week. I shall produce a document describing the axiomatisation for function in OWL that started all of this off.

PS Just finished this, and found out that blogpost stripped off all my nice syntax highlighting. Took a bit of effort but (hopefully) it should all be back in again now.

After 12 years of trusty service, with 5384 miles on the computer, my old bike, a Dawes Giro Audax, has finally given up the ghost; it was an excellent purchase. I got it just after I moved to London, really didn’t have much cash, immediately post-PhD. At around 600 quid, it was expensive, but turned out to be worth every penny. It was quick, comfortable, usable for training, usable for commuting. And after 12 years, the paint job and most of the rest of it is still flawless.

Well except for the STI shifter which finally popped on me and is essentially unfixable, especially as the bike has a 7 speed block at the back, which they just don’t make any more. I have to sigh a bit at this; the economics of planned obsolescence might make sense, but it’s a pretty stupid way to manufacture things. In the past, you could expect more than 10 years out of a bike before it became incompatible with current specs.

So, I needed a new bike. In the end, I decided to go for a more straightforward tourer; in the 12 years since the Audax bike, my legs have got older and slower. So I bought a Ridgeback Voyage. It’s very Dawes like; I got lucky on the fashion cycle (pun), because steel seems to have made a strong recovery against alumimium. 3 or 4 years ago, I wouldn’t have got steel for love nor money. It’s got one of the newer “everything with an allen key” handlebar arrangements; seems to work well, except that I tend to scrape my leg while honking up a hill. The ride is firm but comfortable; the larger tyres makes bumps that would have rattled the Dawes non-existance, while the greater mud guard clearance leaves space for more (I’d upped the tyres on the Dawes till I had only a few mm of space). In short, nothing exceptional, just everything well put together.

The group set is similarly functional. The STI gear levers work reasonably but are not as nice as the Dawes: the main lever has a little too much sideways, and seems to get away from you when braking. The second lever (on the side rather than a dual lever) is functional but can only be operated on the hoods. Still, the play in the brakes is more than made up for with the secondary levers — more than a suitable replacement for the old suicide bars; these are so good that I have started to use them as my main downhill brake.

I’ve been bombing around the countryside since I bought it; up to 200 miles so far, including a run out to Hexham. It may not be the fastest bike in the world, but it’s so comfortable that it feels like I could go for ever.

Well, this is it; although I have been using this blog for a week or so now, I haven’t told anyone about it because it wasn’t quite ready. Today, with a little VirtualHost hacking and it’s finally up and running. I’m not totally happy with the theme yet, but that can change over time. The basic content uploaded, commentary and permalinks seems to all be working. Many thanks to Dan Swan who set up wordpress and has lent me a bit of his virtual machine. An excellent job, as ever.

Say good bye to my old trusty site which is now decommissioned. Exercise in irrelevance is dead, long live….

I got this great idea a few months back, but never got around to writing it up; in the course of doing so, I realised that it’s not entirely novel. Still, as this is going up on my all new blog, I’m going to post it anyway. It serves two functions now: firstly, it’s got an image it it, which I’ve not done before and need to see whether it works; and, second, the image has been created with Inkscape, which I haven’t used before. Okay, onto the idea.

One of the problems with many forms of renewable energy is that it comes when it comes. Wind Power is available when it’s windy, solar power when it is sunny and so on. Of course, the requirement for power does not fall into the same pattern; we mostly need it during the day and the evening, when it’s very cold or very hot and so on.

So, here is one of my solutions, which is the sea cylinder storage system. It looks some what like this:

Sea Cyclinder

First, you did a big hole in the ground (the cylinder), relatively close to the sea. From somewhere near the bottom you did a tunnel outward, underneath the sea to the same depth as the as the cylinder. This lets the water in. Now, you cap the top, and pump air in; this will push the water out, so that the cylinder will contain compressed air. Even with a shallow sea like the Channel, this would result in compressed air at 100m+ of pressure. When you want to store energy, you pump air in, when you want to get energy back, you let the air out again. The bigger the cylinder, the more energy you can store.

Think it’s a great idea, but it’s turns out not to be novel; in fact, the normal way of doing this is just to pump air into the hole in the ground. You can mine a cavern in salty areas by virtue of pumping hot water down, dissolving away a cavern. If I understand the physics correctly, you would get something like 100kJ of storage capacity per litre of compressed space, for a 10 atmosphere system; the efficiency of storage is dependent on what you do with the heat produced on compression and cold produced on expansion; I guess, in some cases you could pipe the air to where you wanted it, and use the cooling and electricity for an aircon system.

If you are interested, read all about it on wikipedia

I’ve been thinking about the decline of the honeybee population after watching a BBC documentary on it; I’ve decided that it is all the fault of the theory of comparative advantage.

To provide some background. The honey bee population is a massively important insect population; of course, it’s important for the production of honey, but a more important function is that of a pollinator. Many of our agricultural crops require pollination to be of use, to produce the fruit or nuts that we eat. Bees do this task as part of their natural life-cycle. But so do many other insects. So why bees? Why not just let it happen, which it will do anyway. The problem is that honey bee is now suffering from massive collapses in it’s population numbers — this is typified by “Colony Collapse Disorder”. At the moment, it’s not clear what causes this. The documentary argued that it’s a multi-causal disorder, partly as a result of known pathogens (including a mite which jumped a species barrier a while back), potentially as a result of pesticides and then just general stress.

Well, the reason is the law of comparative advantage; this is not an evolutionary theory, as you might think, but an economic one. In short, it’s a free market theory which suggests that not only should you pursue profitable work, but that you should pursue the most profitable. Say, for example, a country can profitable produce both shoes and gloves, but that gloves are more profitable, it should just make gloves and then import the shoes for the feet of it’s own population. In this way, you get more profit in the end.

Now applied to shoes, this makes sense (although even here it has problems). However, applied to agriculture, it’s a much bigger issue; it results in a massive monoculture; huge areas of the US are covered with a single source crop. This is a problem for the bees and, indeed, the local insect life. If you consider the almond harvest in California, all the trees blossom in a two week period. This makes an enormous glut of nectar for the bees but, of course, the rest of the year there’s nothing to eat. So, no local bees.

The solution to this is to ship bees into California; the astonishing statistic that the documentary came up with is that the almond trees of California require 80% of the US honey bee population to pollinate. 80% is a truly unbelievable statistic. To achieve this, they ship bees from across the US down to California. Most bees move onto other places later in the season, often being hired out 2 or 3 times. Of course, bringing all the bees into one place is stressful (they have to be shipped), the potential for disease is enormous.

One solution to this, would be to plant as well as almonds, a set of other trees. Oranges maybe, olives perhaps. But the law of comparative advantage essentially makes the economically inviable; it is not enough to make a profit, you have to make the maximum profit. Someone growing oranges would get bought out by an almond grower.

Another solution to this would be to have several different species of insects, as pollinators. While they might still need to all be in one place, at least there would be some species barriers; if it were done carefully, the bees could even be in placed in non-overlapping regions. There’s a problem here, however. The law of comparative advantage strikes again; it’s not economically viable to husband another species because the domestic honeybee is better, at least until it’s colonies started to collapse. The current solution is to import bees from Australia; at the moment, it’s not suffering from Colony Collapse, it’s still free of a mite infection that has travelled around the rest of the world and has bees spare to export. This can’t be a long term solution, though.

So, why do people do it? If law of comparative advantage does not work, why is it that people (unknowingly maybe through the economic system) follow it? The reason is simple; the law of comparative advantage does work; it makes sense to build our agriculture based around a monoculture from an economic standpoint. However, by failing to respect biology, modern business practice is producing a system which is highly efficient in the short term, but is hugely fragile in the long-term; when the system breaks, everyone suffers. In the light of climate change, this has to be even more of a worry; rich and complex ecosystems tend to be more robust to change, simply because they have more species to loose before they collapse. In the past, we have maintained monoculture through a rich diet of pesticides and fertiliser; it’s probably past the time when we should start to discover better ways of farming. And a new form of economics to remove the pressures that forced the old system.

While it’s not a major problem, the inability to uniquely and reliably identifier a particular scientist is a niggle; a few years ago, I was distressed to find that I was scheduled to give a talk at an eScience conference about security; anyone who knows me, will understand how implausible this was. I hadn’t considered the possibility that there was another Phillip Lord in eScience. It’s not that common a name.

So, what would we want form such a ID system? I’ve think that the basic requirements would be:

  • the IDs should be unique; one ID only ever refers to one scientist.
  • the reverse should also be true; one scientist should not need to change their ID.
  • the ID should be printable, so that it can appear in papers.
  • the ID should be usable with a resolution system.

I think that this is it. I would say, also, that there are some softer requirements. Firstly, I think that the IDs should be useful to the scientist (above and beyond being able to link all their papers are research results); this would give them more immediate feedback, so that they would find the system to be a good thing, rather than a burden. Secondly, the system should be familiar and easy to use. Finally, as an anti-requirement, the system need not be secure; that is, it would be possible for someone to pretend to be me; this is not to say we couldn’t layer a secure identification system on top of the IDs.

So I thought about what form the ID would take. My first thought was just to layer the system on top of a first name, surname of the scientist. This has the big advantage, of course, that it makes the system easy to use; scientists already know their own names (mostly) and so does everyone else. People will remember the IDs easily. The problem is, of course, that peoples’ names are not fixed; women, particularly, are likely to change their names, and once the link between name and identifier is broken the advantage is lost.

My second thought was that we could use identifiers chosen by the scientist; this is not a bad idea; of course, it’s harder for humans to link between the ID and (other) scientists, but in time you would come to know IDs for most of the people in your domain. However, this form of identifer is also likely to become broken over time: firstly, many scientists will just want to choose their names, so we have the same problem as before; secondly, some scientists will just want to change their IDs — while peanutbutter or DullHunk might work now, it is possible that the owners of these names will come to regret them like the “Phil loves Newcastle United” tatoo that I don’t have on my forehead.

In the end, I’ve come to the conclusion that only a semantics free identifier actually makes any sense. This is clear the least memorable route, but even here it’s not too bad; I know my NI number by heart because I use it a lot (or used it a lot at one point in time). In practice, most scientists read stuff on the web, so this could be resolved to show the full name automatically; in most cases, with papers for instance, it would be augmented with a standard name anyway.

So, what form of ID do we want? Well, the simplest form would be a six-letter code. This gives 300,000,000 alternatives; if we add in numbers this rises to a litle over 2 billion. Probably more than enough for scientists now and into the future. The system could be extended if the name space ran out. However, I think we could improve the system by adding an extra letter to make 7; this would now mean that we could ensure that no two scientists had a ID with only a single edit difference; essentially, one letter would be redundant. Finally, we could add a final letter to make a checksum — basically, treat the letters as base 26, multiple them, divide by 26, take the remainder and use this as the last letter. This would allow an easy validation step. Finally, we might want to do a dictionary passed block on some names; pity the poor scientist who ended up as NOBRAINS or other far worse 8 letter IDs.

As it stands, I don’t think that this would place too much load on scientists, but it would also not appeal to people; the big win would come when they would use these IDs to make their daily life easier. This could be achieved by sticking an authentication protocol, OpenID being the obvious one, although the IDs are generic enough that any authentication system could be stuffed on the end; as the IDs are not going to change over the life of a scientist this should reduce the management load of yet another identifier. Potentially, we could login to eduroam, various academic tools, wiki’s and the like all with a single ID. At the last RIN/DCC meeting, many people argued that they need username/password registration; I suggested that this was a significant pain and barrier to reuse; this is true, but the barrier gets a lot less if the registration process either disappears or every scientist gets to reuse the same ID.

Technologically, I don’t think that this would take a lot of effort to set up. Socially, the demands would be huge; for it to work, the basic technology is not enough; we would need to put in infrastructure to make sure key tools supported the system; JeS and Shibboleth would be obvious first points of contact; adding an OpenID provider would support less formal resources (such as project Wikis), but collaborating with Wikipedia and paying them to add support would help.

In some sense, I look forward to the day that I cease to be Phil Lord and become ADSJWOSK.