I’ve been thinking of writing a post about my experiences with Rust for a while, but haven’t found the time. The call for Rust#2019 posts seems like an good opportune moment to contribute.

My rust experience is extremely limited. I’ve written a single library in Rust, called Horned-OWL, (http://github.com/phillord/horned-owl), for manipulating OWL (http://www.w3.org/TR/owl2-manchester-syntax/). I started it in August 2017 and it took over a year to full implement the spec complete with parser and renderer; a length of time that is more reflective of the sporadic availability of time that I have to work on these things rather than anything else.

And the experience has been positive. There is already a good and complete library for manipulating OWL called the OWL API (http://github.com/owlcs/owlapi), so I needed a strong motivation for writing another. That motivation is simple: the OWL API is in Java and it is slow. Rust has fulfilled it’s promise for me; Horned-OWL is an order of magnitude faster thatn the OWL API.

What have I learned from the experience though, and what could be improved?


Community

I thought I would start with my experience of the community first. Rust is famous or infamous around the internet. In a modern day version of Godwin’s Law, any discussion of Rust results in accusations of SJWs in a remarkably short space of time. I should show my cards here; I’m all in favour of social justice, so I was not put-off my this but curious.

And, actually, I have found the community remarkably helpful; I have recieved in-depth and clear replies to any questions that I have posted on the Rust forums. I’ve also learned something from the community. I have seen it answer questions that I considered foolish or of the “please do my home work for me” kind, while maintaining a high-level of politeness. In the end, I think this is good; perhaps a more blunt response to foolish questions would educate the questioner but always erring on the side of politeness makes the Rust forums a happy place to be. I wish it were true of the wider internet.

My only criticism would be that many of the community are really bought into the process of Rust; as a result many will offer solutions that say “in the future you will be able to do this…” or “if you use nightly, then…”. This can be a little confusing when I am trying to build software now, based around some form of stability.


Documentation

Over all the standard of documentation for Rust is high. I read the rust book and found it to be useful and informative.

I would raise one area for improvement. Rust is attempting to achieve something novel and many in the community are very bought into this; as a result, I think, discussions of where Rust’s model causes problems are minimized. I spent several fruitless days, for example, trying to lazily instantiate a data structure. The solution is short and simple and unsafe. Similarly, I struggled for a long time with building a circular data structure; in this case, I tried several (first Rc, then numeric indexes for nodes, finally tree structure with Rc identifiers to make circular).

Documentation-wize, Rust needs to embrace it’s capabilities for writing unsafe and circular data structures. It can do this perfectly well, and new developers need to know how early.


Learnability

The theme of Rust 2018 was productivity. And it has achieved good things here. But for me, the biggest problem that Rust has remains all the users that it does not have. One significant reason for this must surely be it difficult learning curve.

Productivity is a form of usability, but it is not the same thing as learnability. Some of the changes in Rust 2018 do improve learnability: while learning the old module system, for example, I found myself lost and often ended up using one of the strongest techniques in any programmers toolkit; I changed things randomly until it worked. Other changes, such as the ? or impl trait improve productivity, but also introduce new syntax which hampers learnability.

Let me give an example. I am old school enough as a programmer that I use “print everything” is one of the ways I program (and something I teach!). If you use this technique in Rust, you hit three learnability problems. First, these two statements seem backward.

println!("{?}", object);
println!("{:?}", object);

The one I want to use most frequently is the second form (i.e. print anything out). Yet, it’s longer than the first. At the same time, when I try to use the second form, it frequently doesn’t work. Not until, I have added #[derive(Debug)] everywhere. Why? Is there really no way to allow Rust to produce some printable representation for any object, even if it’s just the name of the type and an identifier.

And, finally, to have this work with your test suite you have to do this:

cargo test -- --nocapture

Why the -- as well as --nocapture? And, why does cargo help test not describe --nocapture as an option.

So, for 2019, I would focus on learnability, to consolidate the gains made by Rust 2018. What would this mean in practice? I don’t know all specific steps it should mean, because I am not totally new to Rust any longer (I am now using a strange combination of the old and new module imports). I would suggest that each Rust developer mentor one or two Rust incomers and use this time to understand what the issues are; perhaps, this reflects my academic background, but I am a great believer in the principle that you don’t understand something till you have taught it.

Bibliography

Another day, another Ubuntu upgrade, another broken Marble Mouse. Have I written this before? Well, yes, several times before.

With the release of 17.10, everything broke because Wayland was bought into replace X. With 18.04, wayland is out again — something which I am pretty glad about, because in my experience it was pretty unstable, with the desktop crashing out to login fairly often.

The 17.10 solution appears to have been:

gsettings set org.gnome.desktop.peripherals.trackball scroll-wheel-emulation-button 8

This fails again on 18.04. Fortunately, the solution is quite simple which is to just return to the libinput configuration that we had before.

xinput --set-prop "Logitech USB Trackball" "libinput Scroll Method Enabled" 0 0 1
xinput --set-prop "Logitech USB Trackball" "libinput Button Scrolling Button" 8

I look back to the good old days of gpointing-device-settings which uses to work and looked nice. Why have we come so far from here?

Oh dear, if it seems that we have been here before, it’s because we have. Another Ubuntu upgrade, another broken Marble Mouse.

Took my a while to work out this one, but the answer is hidden in a bug report for RedHat. Actually, if I had read my last blog post I might have worked it out also.

What happens is Wayland, the new, er well what ever it is, for 17.10 looks at the marble mouse, says “it has no scroll wheel”, so disabled the input method. Which is unfortunate because then the emulation doesn’t work.

The solution is to turn it on again:

xinput --set-prop "Logitech USB Trackball" "libinput Scroll Method Enabled" 0 0 1
xinput --set-prop "Logitech USB Trackball" "libinput Button Scrolling Button" 8

Dearie me.

Update

Worked. Was happy. Now it’s stopped working. Less happy.


Abstract

The process of building ontologies is a difficult task that involves collaboration between ontology developers and domain experts and requires an ongoing interaction between them. This collaboration is made more difficult, because they tend to use different tool sets, which can hamper this interaction. In this paper, we propose to decrease this distance between domain experts and ontology developers by creating more readable forms of ontologies, and further to enable editing in normal office environments. Building on a programmatic ontology development environment, such as Tawny-OWL, we are now able to generate these readable/editable from the raw ontological source and its embedded comments. We have this translation to HTML for reading; this environment provides rich hyperlinking as well as active features such as hiding the source code in favour of comments. We are now working on translation to a Word document that also enables editing. Taken together this should provide a significant new route for collaboration between the ontologist and domain specialist.

  • Aisha Blfgeh
  • Phillip Lord


Plain English Summary

Ontologies are a mechanism for organising data, so that it can be generated, searched and retrieved accurately. They do this by building a computational model of an area of knowledge or domain.

But, building ontologies is a challenge for a number of reasons. One of the main problems is that building an ontology requires two skills sets: the use and manipulation of a complex formalism, which tends to be the job of an ontologist; and, the deep understanding of the area that it being modelled, which is the job of a domain specialist. It is fairly rare to find a person who understands both areas; so people have to collaborate.

In this paper, we describe new mechanism to enable this collaboration; rather than trying to train domain specialists to build ontologies or use ontology tooling, we instead manipulate an ontology so that it can be viewed as an office doc, which ultimately is the tool that most people are familiar with.

A chicken is an eggs way of making another egg

One of the joys of Ontology building is that you can end up in some fairly obscure arguments; the one I got in today is whether a sperm is a human being. Of course, this is silly, but mostly because of the limitation of our language. I would like to describe here why a sperm is a human individual and why it is important.

One of the long running discussions in the ontology community is how we define function. With respect to biological organisms and biological function this is particularly challenging; in fact, biology continually raises questions and exceptions which is part of the fun.

I added my contribution to definitions of function several years ago (1309.5984), built largely around evolution and, more importantly, homology.

One of the issues with other definitions available at the time, and specifically, BFO is that it used a definition as follows:

A biological function is a function which inheres in an independent continuant that is i) part of an organism[…]

The point, here, is that by definition an organism cannot have a function because an organism cannot be part of an organism. This works well for people, but badly for some organisms, especially eusocial ones like ants which appear to have functions in their society which they have evolved to fulfil. My argument here is that it also means that a sperm cannot have an function, because, actually, a sperm is an organism. Of course, this seems daft; a sperm is, surely, part of an organism in the same way that a blood cell is. However, this is not true.

All organisms have a genome — their genetic material. Many organisms have a single copy of their genome; for single celled organisms, this gets doubled before they divide, so actually they have two copies of their genome much of the time, but these two copies are identical. These organisms are called haploid.

However, as you might expect sexual reproduction makes things more complex. This involves taking two previously independent organisms, merging them, then dividing again. The merged organism has, now, two different copies of genome; these are called diploid.

Once this happens, the life cycles of an organism gets more complex. Some organisms, such as the yeast (Schizosaccharomyes pombe) really dislike being diploid. It’s possible to maintain them in the lab, but generally given the choice they sporulate and become haploid again. However, others such as brewers yeast (Saccharomyces cerevisiae) behave differently. It grows, develops and lives as in the haploid form; but it also does this in the diploid form and is quite happy.

Many plants do this also and exist in both a multicellular haploid form (called the gametophyte) and a multicellular diploid form called the sporophyte. In the flowering plants, the gametophyte is very small, and exists entirely within the sporophyte stage; in other plants, the gametophyte is larger. But both forms can grow and develop, a process called alternation of generations.

As far as I know, no animals do this. However, there are quite a few organisms where both a haploid and a diploid form exists; the male ant that I refered to earlier will be a haploid, while the females are diploid. This doesn’t disadvantage the male — it simple produces sperm which are genetic clones of itself.

In humans, like the flowering plants, the diploid form is dominant. There are two haploid forms, the egg and sperm, both single cells; the female form exists entirely within the diploid from which it arose; the sperm can travel a bit further but not much.

Of course, in most practical circumstances, the sperm would appear to be a part of the man that produced them; if I was building a medical ontology, I would make this statement, because it would fulfil everyones intuition, common medical and legal practice.

But, there is no real justification for this. It exists, it is independent from that man, has a different genome from that man; it is an organism in the same way that a gametophyte or a male ant is an independent organism. For a biological ontology, working cross-species, we have no basis for making this distinction; if the sperm has a function of fertilizing an egg, then the man has the function of producing more sperm. Alternatively, if a man cannot have a function, neither can a sperm.

Does this mean that sperm is a human being? Obviously this would be silly, nor is a sperm a person; but it is an organism and it is human. We just lack a word to describe this.

This discussion came up at ICBO 2017, following a discussion with Barry Smith.

Bibliography


Abstract

As the quantity of data being depositing into biological databases continues to increase, it becomes ever more vital to develop methods that enable us to understand this data and ensure that the knowledge is correct. It is widely-held that data percolates between different databases, which causes particular concerns for data correctness; if this percolation occurs, incorrect data in one database may eventually affect many others while, conversely, corrections in one database may fail to percolate to others. In this paper, we test this widely-held belief by directly looking for sentence reuse both within and between databases. Further, we investigate patterns of how sentences are reused over time. Finally, we consider the limitations of this form of analysis and the implications that this may have for bioinformatics database design. We show that reuse of annotation is common within many different databases, and that also there is a detectable level of reuse between databases. In addition, we show that there are patterns of reuse that have previously been shown to be associated with percolation errors.

  • Michael J Bell
  • Phillip Lord


Plain English Summary

Bioinformaticians store large amounts of data about proteins in their databases which we call annotation. This annotation is often repetitive; this happens a database might store information about proteins from different organisms and these organisms have very similar proteins. Additionally, there are many databases which store different but related information and these often have repetitive information.

We have previously look at this repetitiveness within one database, and shown that it can lead to problems where one copy will be updated but another will not. We can detect this by looking for certain patterns of reuse.

In this paper, we explictly study the repetition between databases; in some cases, databases are extremely repetitive containing less than 1% of original sentences. More over, we can detect text that is shared between databases and find the same patterns in these that we previously used to detect errors.

This paper opens up new possibilities using bulk data analysis to help improve the quality of knowledge in these databases.