Archive for the ‘Professional’ Category

I started to write this post a long time ago in October; unfortunately before I finished I got hit with the start of teaching. I considered just ditching the post, as it is now so out-of-date and I am not usually a zombie poster. However, in this case, I shall post as a) it helps my mind to move back toward research after so long away and b) it will be my first of 2012, so I can check my makefiles work!

A couple of follow ups from my previous post.

Nicolas Le Novere commented via twitter on even the highest level assertion of that radioactivity is a dependent continuant.

@phillord fluorescence and radioactivity are occurrent not continuant. Freeze time to check.

@phillord hence the unit of radioactivity: per second (Becquerel)

— Nicolas Le Novere

In my original post, I suggested we needed Radiation, Radioactive or Radioactivity; in hind-sight, perhaps I should have used Radioactive rather than Radioactivity, which may have circumvented this issue. However, I think it is worth considering this a little further.

I would nearly agree with Nicolas that radioactivity is a process; actually, I would say that radioactive decay is a process, while radioactivity is a property of this process. However, in my last post, I was looking at a model which was “BFO-like” as OBI is based on BFO. For BFO, that radioactivity is a rate, is measured per second does not mean that it is an occurrent; any more than velocity which is also measure per second is an occurrent. Actually, in BFO land, radioactivity would be a quality of the atoms which are decaying and not a measurement of the process. This is because, as Pierre Grenon says, properties of processes do not exist.

In fact, if we look more at this more closely still, BFO would also claim that radioactive decay is not, as it might appear, a Process, because processes are continuous. This is not true for radioactive decay, even for a bulk of radioactive material. An atom decays, then there is a pause, then another decays. This makes radioactive decay a processual entity, which can contain discontinuities.

I am not arguing that BFOs treatment of processes is correct — in fact, I think it is nonsensical. However, it is this line of arguing that I was using in my previous post.

David Sutherland rather takes me to task about whether realism does what I suggest.

I agree completely, but what realist principle says you need to give something the most detailed classification you can come up with?

— David Sutherland

It’s a good question, but I would turn it around. I don’t think that realism requires you do this, although this quote from Barry Smith does rather distinguish between simplifications (i.e. not the most detailed classification you can come up with) and reality.

I am beginning to suspect that for you everything is a simplification (model) — for me, functions are part of reality; they are not simplifications; I am not interested in simplifications.

http://groups.google.com/group/bfo-discuss/msg/865e601864fbc2dc
— Barry Smith

The problem, though, is that realism elevates “reality” above all else. I think that this is wrong. Of course, in any scientific discipline, we should by aiming to model the experimental data that we have. But this is not all we need to do. As any statistician will tell you, models are compromises. It is very easy to build a model that perfectly represents the data that you have; you just build a model with as many variables as data points. The model will fit perfectly to the data, but ultimately the model is useless, since it lacks explanatory power. We need use cases, we need simplifications and sometimes we will need multiple representations of the same thing; there are examples galore in my paper (doi:10.1371/journal.pone.0012258) . In fact, Chris Mungall gives a good example when he talks about dispositions and their status as being real:

In fact, I have a particular problem with dispositions being “real” – BFO asks me to believe there are an infinite number of real but unrealized and perhaps wildly improbable dispositions floating around me every second

— Chris Mungall

And later he gives the solution.

taking a hard-headed pragmatic approach – e.g. avoid weirdo classes that don’t correspond to a term a normal scientist would use; introduce distinctions that give you the desired results to queries and inferences)

— Chris Mungall

In otherwords, reality is important. But we also need use cases, we need community norms, and we need applications. If ontologies do not fit with these, then can be as “real” as you like, but they are still wrong.

Bibliography

Well, I am pleased to say that we have now released the new version of kcite. It’s been a while in coming — I had the difficult bit of the code working about 5 months ago, but then got caught up in teaching. Kcite is our bibliography manager which enables citations such as this one (doi:10.1371/journal.pone.0012258) , using DOI or PubMed IDs.

Kcite now uses the marvellous citeproc.js to render the bibliography on the client. The main advantage of this for this release is that the biblography formatting is slightly more regular than before. We’ve also switched to name-author style as the default. There is also a disadvantage which is that the browser has to do lots of Javascript execution client-side; I’ve made efforts to ensure that this is not too onerous; on my desktop, I have been rendering 200-300 item bibliographies, which is much more than most people will use in practice.

In future versions, however, I feel the use of citeproc-js will really come into it’s own. We should be able to enable the user to select their own citation style (currently this is the choice of the authors which makes little sense). We can also add any semantics to the HTML that we choose — CiTO will come properly, for instance. I can also clean up the “unresolved” and “timed out” references. However, first thing on the list is to make the call back for the bibliographic data asychronous. Client-side this should be easy, as we are already using jquery. Server-side requires rewrite rules which I haven’t done before, but I think should not be too hard.

On a separate track, now that I have kcite on what I think is a stable technological footing, I can start to extend in other ways, the most obvious being additional forms of identifiers, critically including WordPress posts with kcite enabled. I’m also pleased that Cross-Ref have recently added the ability to drag metadata in citeproc format (JSON), which means I can skip an integration step.

However, before all of that, we need to restore kblog. We’ve taken the opportunity to move it to a better technological footing, and have started to prepare the new machine that it will be hosted on. This has taken a long time, due to a busy start to the (academic) year. Hopefully, getting hacked is not something we will repeat soon.

The current release of kcite is 1.4.1. This fixes two bugs, one reported by Carl Boettinger (so that now the Javascript only loads when necessary) and another I found which writing this post which made editors appears as authors.

Bibliography

I once had cause to refer, somewhat mischievously, to “a kind of pasta from Tuscany, which is almost identical to spaghetti, but slightly different”; this was on a mailing list that was used by many Italians. It provoked the expected response; an offended Tuscan responded “I don’t know what you are talking about; but if you mean pici”, which I did, “it’s nothing like spaghetti”.

Recently, on the OBI mailing list, there has been much discussion about labels, markers or tracers. What ever you wish to call it, the basic idea is the same; a molecule which is easily detectable, is used to trace something else. This can involve adding a small amount of a radioactive isotope (P32). This makes it possible to follow the molecule (which is otherwise hard) by tracing the radiation (which is generally easy).

So, how do we model this? As with many parts of ontology building, it turns out to be not straight-forward; during this discussion, an email from Philipee Rocca-Serra which left me asking the question, are we being too specific? I will work through an example to show what I mean. Feel free to skip to the punchline if you choose.

Consider, for example, the following models; these are not directly taken from OBI, as I want to reduce the complexity for this article; rather they are in the general spirit of the models which raised these questions.

A label, or something that has been labelled is clearly part of an experimental design. It is not intrinsic to this entity, rather it appears to be a role that the entity is playing in the experiment. So:

Class: Label
       SubClassOf:
          Role

There are, of course, labels of many sorts. The main types that I can think of are radioactive, fluorescent and what I call adherent. So, we might add the following, with a few subclasses of adherent as explanation.

Class: RadioactiveLabel
       SubClassOf:
          Label

Class: FluorescentLabel
       SubClassOf:
          Label

Class: AdherentLabel
       SubClassOf:
          Label

Class: BiotinilaytedLabel
       SubClassOf:
           AdherentLabel

Class: AntigenicLabel
       SubClassOf:
           AdherentLabel

So far so good. However, for a label to be useful, it needs to be manufactured (often in a bespoke fashion, depending on the experiment being performed) and it needs to be detectable. So, we might add classes like so:

Class: LabellingProcess
       SubClassOf:
           Process
           has_output some Label

Class: LabellingDetectionProcess
       SubClassOf:
           Process
           has_input some
                  Sample contains some Label

Now we have three classes for every label type. We can deal with this by generating a cross-product, either at development time, or at the time of use if we are using OWL. However, we need something to tie together these classes. We need a concept to know that we need a RadioLabellingProcess to produce a RadioLabel which we detect in a RadioLabellingDetectionProcess. In short, we need a concept of Radiation, Radioactive or Radioactivity.

Class: RadioactiveEntity
    SubClassOf:
        IndependentContinuant,
        bears some Radioactivity

Class: RadioactiveLabel
    SubClassOf:
        Role,
        RadioactiveEntity

Class: RadiationDetector
    SubClassOf:
       detects some Radioactivity

Class: RadioactiveLabelProductionProcess
    SubClassOf:
       has_input some RadioactiveEntity

This is where the situation gets difficult. What kind of thing is Radioactivity? Taking the realist approach, we need to consider this carefully, determining what this universal is. So, starting from the top, it is fairly obvious that we have a Continuant. Next question, do we have a Dependent or IndependentContinuant. Again, this is fairly clear: radioactivity cannot exist without something to be radioactive, hence Radioactivity is a DependentContinuant.

We have a set of DependentContinuant‘s that Radioactivity could be. The concept Role does not fill well; this is usually ascribed by socially or, in this case, experimentally determined behaviour. Perhaps, Disposition would be better. However, this does not really fit either, as a Disposition is realised “under specific circumstances”. Now this is not true of radioactivity. Either something is radioactive or it is not, and if it is, then it is, to the best of our knowledge, radioactive under all circumstances. It appears, then, that Radioactivity is a Quality, because “it is exhibited if it inheres in an entity at all”.

If we follow the same logic with our other label types, initially, we come to the same conclusions. However, Fluorescence is not exhibited under all circumstances. It only happens when the label is illuminated with the right kind of light. So, Fluorescence appears to be a Disposition. Following a similar logic, this is also true of Adherent. So the best we can say about the property of the substance that makes it usable in labelling is that it is a RealizableEntity.

Having Radioactivity stand out in this way is a little unsatisfying. Let’s consider the logic again. One classic experimental form is the pulse-decay experiment. I can, for example, feed a rat with, say, radioactive phosphorus briefly. After this, you can trace the course of phosphorus. Now during the course of this experiment, the rat becomes radioactive and then ceases to be radioactive again. But, it is notably, the same rat. So, perhaps, the statement that things are either radioactive or not is wrong. Perhaps, it is not a Quality at all. The flaw in the logic is the assumption that because an atom is either radioactive or is not, therefore anything made up from atoms must be so. But an entity can have its atoms totally replaced and still be the same entity. In this case, what is true of a rat, is also true of its DNA. We can replace the atoms in a sample of DNA with other ones and still, have the same DNA. So, maybe, Radioactivity is a Quality at an atomic level of granularity, but is, after all, a Disposition at others.

Thinking further, however, maybe it is not a Quality at all. A mass of P32 is always radioactive, but a single atom? Perhaps not, since it only displays this when it decays. So, perhaps, it is a Disposition after all. However, this makes no sense, because dispositions are displayed under “specific circumstances”. Now, to the best of our knowledge, radioactive decay is stochastic — it is so random, that radioactivity is often used to generate randomness. We cannot specify the circumstances under which it happens, it just does. More over, after it displays the radioactivity, what has happened to the atom? Using the same argument as before, we could say that, like the rat, the atom still exists, it’s just that (some of) the elementary particles that make it up have changed. But this way, surely, madness lies, as “being phosophorus” would become some sort of dependent continuant, which the atom displays during its decay, while it happens to have the right number of protons. So, probably it makes more sense to say that, the decay process represents the end of the existence of the phosophorus atom and the beginning of a new atom (and a radioactive particle). In which case, even our original decision that Radioactivity is DependentContinuant is wrong. It’s not a DependentContinuant at all, it’s only a process which over as soon as it begins.

So, what have we achieved? Well, I would argue, not a great deal, except for a lot of discussion. More over, we have ended discussing very detailed issues about the physical properties of matter, when we started discussing an ontology of biomedical investigations. This might be entertaining, or it might be very dull, depending on your point-of-view. But, what we have failed to produce is a specific conclusion.

The problem here is realism. A realist ontology represents portions of reality, that is classes of things that really have instances. We have to ask these questions to try and determine whether Radioactivity exists and what kind of thing that it is. We can set realism against pragmatism. Previously, Robert Stevens has described the problems that this causes by preventing the ontologist from modelling “unicorns“, such as Newtonian mechanics, or canonical anatomies. The unicorn principle says, if it is useful to model a concept in an ontology, then often we should. Here, I introduce what I call the “Pici principle” — if it is not useful to model a concept then we should not. As a British native, pasta is pasta; it all tastes much the same to me. Generally, I do not need the ability to be able to distinguish pici and spaghetti, unless I want to provoke a response from an over-excitable Tuscan. The sensible course is not to get involved in the discussion in the first place.

The same applies in this instance. There is a clear use case for the concept of Radioactivity; without it, we cannot say that a radio-label is radioactive, or that a fluorescence detector is not going to work detecting it. But to achieve this use case, we do not need to understand very deeply what Radioactivity is. Describing it as a DependentContinuant is enough, and it will fulfil the use cases. It will not enable us to ask questions about which kind of labels detect qualities and which detect dispositions. But in the absence of a use case, this is not an issue.

A chemist may care, and may want to classify radioactivity further. This is fine; as with pasta, we can safely leave these issues to someone else, in the knowledge that they are probably better qualified to give an answer anyway. So long as they decide that Radioactivity is a DependentContinuant, it does not matter to us what kind of DependentContinuant; we have said nothing incorrect. So, our ontology will integrate with theirs, without change to either. By being as vague as our use cases allow us, we have actually increased the ability of our ontology to integrate with others.

In short, the pici principle encapsulates the idea that deciding what we should not model in an ontology is as important as what we should model. And this decision comes from use cases, not reality.

While I am currently spending a significant amount of my time promoting the idea that blog technology can be, and should be used for serious scientific material, I thought I would make a post of a different and perhaps more traditional vein: that is, a light-weight idea, with no serious research behind it, but Years ago now, I created an Energy Wiki full of daft ideas for making energy. I last revisted this in 2009, with an idea for storing energy at sea. I’d actually forgotten that part of the reason for this was to try out Inkscape, which is part of the reason for this post. I wanted to try a bit of multi-media, that is, a blog post with an image in it. High tech.

So, the idea. One form of renewable is the Solar Updraft Tower, also known as a solar chimney. This works straightforwardly enough: you build a large greenhouse in a desert, with a very large chimney in the middle. The top of the chimney is in cold air, the bottom in hot, and an updraft results; stick a turbine in or at the base of the chimney, and you get energy out.

The problem is to work at all efficiently, you need a big temperature differential, so a tall chimney. This in turn means a wide chimney, both to support a substantial updraft, and for mechanical reasons. Tall means 500m or more. The bottom line of this is that a pretty significant capital expenditure is required, followed by a relatively long pay-back period, which in turn means that the biggest single expense of the project is likely to be interest charges, rather than anything else.

So, my idea, is to use an inflatable chimney instead. Initially, I thought about some kind of helium lifting scheme, but then I realised that this makes no sense; why not use hot air, which after all is what the whole system is designed to generate. Consider, for instance, the following organisation:

Inflatable Chimney

Essentially, it’s a traditional balloon with a hole in the middle. Obviously the whole system is stackable — a second balloon could be placed on top of the first and so on. The whole structure could be assembled or dissassembled as desired. Unfortunately, though this would probably take quite a bit of work.

My second thought came from the idea that, while most designs for solar chimneys have the chimney in the middle of the greenhouse, it doesn’t really need to be. A horizontal pipe to the middle would be enough. The chimney could be outside of the greenhouse. The advantage that this brings is that the tower could be raised or lowered in-situ, without the risk of it falling on, and damaging the greenhouse. So my second idea was to build the chimney as a two cylinders, with the gap between the serving as the inflatable, buoyant structure. By pleating the cylinders in opposite directions like so:

Concertina Chimney

the whole structure should concertina up and down. By inflating from the top and deflating from the bottom, it should be possible to raise or lower the entire system by opening and shutting vents at the bottom or top of each section to the inside of the chimney.

One advantage with this system, is that as the chimney gets higher, the temperature differential between the inside and the outside gets greater, which should mean that the taller the tower, the more bouyant the sections get; this should help to keep the entire thing as upright as possible, as will the air travelling through the middle, like some gigantic party blower.

Another addition that cames to mind would be to add inflatable half-toroids around the chimney at regular intervals. With a curve on the top, and a flat bottom-side, the entire thing should operate like an aerofoil, lifting the tower up; so, the windier it gets, the greater the lift, which is just what is needed to keep it as upright as possible. This should mean that the chimney can operate in relatively high wind levels.

This kind of system could even work in concert with a fixed chimney — extending the height by 500m say, and increasing it’s efficiency. It could also act as a supplement — operating only on very hot days when the greenhouse has excess capacity. Or, finally, it could operate while the main chimney was being built, meaning that a plant can start generating income earlier, which should reduce the cost of interest payments.

Of course, this all comes with drawbacks: the ongoing running costs are likely to be a significant; wind will remain a significant factor regardless; and, finally, inflating the tower will using hot air, which will reduce the efficiency of the whole system. Are these flaws significant? Well, as I said, this post is light-weight with no serious research behind it. I have no idea, nor any really clear idea about how to work out these costs. Answers on a postcard please.

I have been pushing the idea of Kblogs — scientific publishing using commodity software — for a year or so know. Our main site, Knowledgeblog.org has got around 100 articles now, and has had about 50k page views (or about 4x the number of raw page hits) and has generated a certain presence on the internet. While this is generally good, the price of fame is that we have moved somewhat up the list of potential hack targets. Unfortunately, this has resulted in two compromises on the machine; they were probably not disconnected, although we have no evidence to link the two at the moment.

The first was through the timthumb zero day vulnerability. It involved a code injection into a WordPress installation using a thumb nail generator with a dodgy bit of PhP in it. We cleaned the system up as well as we are able and went from there. Sadly, a couple of days ago, we had a second break in. This was a more serious and directed attack (the timthumb was scripted, and we were one of several thousands of sites to be hit). In this case, the machine has been root compromised, and the web server used to gather username/passwords in a phishing expedition. We do have backups and all of the content. There were a number of things that we could have done to secure the machine further, at least one of which may have prevented the hack, but there are only so many hours in the day.

So, where does this leave us? Is the whole idea of knowledgeblog broken? Personally, I do not think so. While I have been critical of the cost associated with academic publishing, I am aware that it cannot happen for free. Running and maintaining a web server takes money; it is something that we have been doing on a shoe-string for a while, especially since our JISC money ran out. In the couple of years that we have run knowledgeblog, I think that we have learned and shown a lot. As well as page views and content, we have shown that scientific publishing can be easy for the author; that we can generate attractive articles this way; that we can start to embed computational accessible knowledge into these articles. We have shown that we can do peer-review, if we need. We have shown we can archive and preserve for the future. We have shown that knowledgeblog is good for grey literature. We have added DOIs. Multiple authors. Good looking maths. We even have some preliminary stats on how much publication costs from Word doc to website.

At the moment, though, we do not have a business model. It is clear that if we are to move this forward, it needs to be run as a service, managed, and looked after, something which is neither my expertise or desire. The analogy that I have made earlier with Wikipedia is, I think, a good one; it would be good to move this into a foundation status.

The path from here to there is a long one, however. For the moment, we will restore knowledgeblog, and it will re-emerge, although at this time of year, it will take a while. But we look to the future as well.

Although in some disciplines, it is relatively uncontentious, the rise of open access publishing has produced a lot of comment in others. In one of my two disciplines, computing science, this form of publication is still the minority, and still raises comment. For instance, Michel Beaudouin-Lafon has commented suggesting this scientists are highly naive about the costs of publishing. He argues that scientific publishing is intrinsically expensive, and that open access will have negative implication for science as a whole.

Over the years, commercial STM publishing has become a cutthroat business with cutthroat practices and we, the scientific and academic community, are the naive lambs, blinded by the ideals of science for the public good-or simply in need of more publications to advance our careers.

— Michel Beaudouin-Lafon

Personally, I think that “naive” is the wrong word; scientists are often not good at operating in a co-ordinated way. Although, we work together in small groups, and sometimes in large groups, in general, we are still very much a cottage industry; at any one time the number of scientists working in a distinct discipline is not that large, even on a world-wide basis. Of course, this works pretty well for scientific advance; we are not a production industry, but researcher. No one knows the best way forward, and we need to experiment to find out. But it does mean that we often play second fiddle to those capable of more co-ordinated action; compare for example, scientists to the medical community with its tightly controlled professional bodies. Or, of course, the STM publishing industry, particularly as it has become focused in fewer and fewer competing publishers.

For example, ACM spends several million dollars every year to support the reliable data center serving the Digital Library

— Michel Beaudouin-Lafon

Clearly, it is true that the cost of data centres and storage are not trivial. But the cost of servicing data has plummeted over recent years. Scientific papers largely consist of storing words and figures; these do not take up much space. The laptop I am working on has a copy of my email directory; it’s not complete but it carries most of my outgoing email since 1994 and a lot of the incoming; this is a lot of words! But the total size is now less than 5G, which will fit on a 3 pound pen drive, or my phone. Now if ACM were storing research data, then it would be a totally different issue; the costs here are significant, problematic and rising. But they do not.

The ACM might spend several million dollars a year, but the bottom line here is that this does not account for the cost of publishing. The Wikimedia foundation which supports Wikipedia spends around 10 million dollars a year, in total, on one of the top ten websites in the World. This is about the daily cost of the whole scientific publishing industry.

The quality of a journal is typically measured by its impact factor

— Michel Beaudouin-Lafon

And a very bad measurement of journal quality it is too. As someone who works in two disciplines at once, I constantly get hit by this: my best computing publications have laughable impact factors when compared to my bio publications; when judged against computer scientists, however, my bio publications have such high impact factors, that they have to be ignored as outliers.

At $5,000 per publication, my lab is broke.

— Michel Beaudouin-Lafon

It is not clear where the $5,000 figure comes from, as most open access is less than this. But, anyway, this argument makes no sense. Our labs are already paying a vast amount of money for publications; usually this is squirrelled away in overheads, taken from our budgets before we see the money. And, although it doesn’t happen so much in computing, many journals levy significant page charges.

They are the big pharmaceutical labs and the tech firms who publish very little but rely on the publication of scientific results for their businesses. With author-pay, research will pay so that industry can get their results for free. Is this moral?

— Michel Beaudouin-Lafon

Open access on its own is not enough. we also need public disclosure about the process. Perhaps the examples of the pharmaceutical funding journals directly are unusual. It is not so easy to tell at the moment. In this context, it could be argued that the last thing we need is the pharmaceutical industry paying for the results of science. Of course, conversely, the pharmaceutical industry could argue that they already do pay for the (publically funded) research by way of taxation.

While they are interesting, all of these arguments really miss the point: the pharmaceutical industry already get their results for free, as their subscription fees do NOT pay for the research just its publication. The publishing industry also get the results that they depend on for free or with page-charges by charging the authors. And for every paper that researchers publish for free, they pay more to read someone elses.

So, we are already in the situation that we are told is not moral.

It is important to understand that the scientific community is largely at fault

— Michel Beaudouin-Lafon

There is some truth in the idea that scientific community has let itself walk into the situation, but ultimately I feel, that this is like blaming the financial crises on those recieving subprime mortgages. It is true that it is scientists who submit their best work to expensive closed publishers; but, especially in early and mid “career”, we do this to safe-guard our futures.

The problem with the subscription model is not the model but the fees.

— Michel Beaudouin-Lafon

Quite the opposite. Ultimately, I don’t pay the fees, so how much do I really care? But the subscription model prevents re-purposing, it limits access, it prevents competition. I work at a university as a scientist because I value the ability to be able to swap and discuss my work. I want the general public to be able to access my research. Dissemination of knowledge should be part of my job; I think it is reasonable that I, or my employers, should pay for it.

Which is not to say that the level of fees are fine; they are not. They are far to expensive under any model.

The added value provided by publishers is twofold: reputation (the value of the imprimatur), and archiving (the guarantee that the work will be available forever).

— Michel Beaudouin-Lafon

And this is it? Is this all that we are getting, given the costs? Especially the the reputation comes from the work, not the journal, and the archiving should be a rapidly decreasing cost.

Actually, in practice, I think the current publishing industry brings more value; selection of reviewers, sometimes copy-editing and, critically, advertising of the content. But, again, times have changed, and publishing practice in these areas has not.

The only other area in publishing where authors pay to get published is called the vanity press. Do we really want to enter that model?

— Michel Beaudouin-Lafon

This is a low blow, nor is it true. Many people pay for their own publishing costs. The government pays to publish election results; health service pay to publish public health information; companies pay to publish product safety recalls. All circumstances where the value to the author of public awareness of their content far exceeds the income they would recieve from charging. And the biggest example of this is the advertising industry.

Nor is the implication that this will necessarily result in low quality true. Consider the blogosphere; of course, there is much junk, the standard of science journalism is very high; frankly, when ever respecting sources like the BBC start talking about pixie dust, it’s probably at least as high-standard the as mainstream media.

All this aside, what do I, as a scientist, actually care about? Some of these leap to mind:

  • Stable location and content.
  • archiving
  • peer review
  • discovery and selection

Open access was built on the basis of replicating the existing publication. PLoS for example did this precisely so that it did not challenge both the business model and the publication procedure at the same time. How much of the costs stem from this? I think that we, as authors and readers, should know. How much of the millions the ACM spends on it’s data centre is involved in managing access controls, for example? How much on advertising? How much at booths at meetings?

Open access has opened the door, but now we need to challenge and change the process. Hosting data is not free nor is archiving. And, yet, I can find own my website from 2002 and enjoy it’s gaudy colour scheme all again. If this blog post is so exciting to the world, that the load brings the server down, you will be able to read it on coral cache. The peer review is expensive and time-consuming; I know because I’ve organised enough of it for BioOntologies. But then I did not get paid for this and how many of the real costs of peer-review do publishers bear? And discovery and selection? Well, we have google, and I follow my peers on twitter.

Author fees are not a solution. […] Finally, nonprofit publishers should take advantage of their unique position to experiment with sustainable evolutions of their publishing models.

— Michel Beaudouin-Lafon

And on this, I could not agree more. Our experiment with Knowledgeblog suggests that we can get 90% (or 80% or 70% depending on who you ask) with commodity software. It’s only a small start, but then I was on the mailing list that saw the first email about the creation of wikipedia, and that wasn’t long ago.