This is just a short introduction to Michael Bell, my PhD student. He’s now in the second year of his PhD, and has been looking at annotation in biological databases. More specifically, we are trying to define quality measures for textual annotation, based around the bulk properties of these databases. It’s related to, but distinct from my early work on semantic similarity. The question is whether we can judge the quality of sentences, words or records based on how they have been used previously, and how far they have spread.
Michael has now started to blog his work, following on from my own knowledgeblog work, and our general commitment to open science. As part of his work, he is starting to build web delivered tools, as it is a useful way of navigating the complex knowledge space of biological data. So, his website is also part of his work.
A good example of this recent blog post discusses the creation of word clouds for all historical versions of Swiss-Prot and TrEMBL and, because everyone loves a word cloud, it is well worth a look.