Further Experiments with Literate Programming

Literate programming comes in many forms and disguises but is essentially the notion that the documentation and programmatic code should be written together, so that the documentation supports the code and vice versa. In this post, I discuss some of the problems with literate programming, my early attempts to circumvent these with respect to ontology development. Finally, I finish up with a description of some new technology which, I think, offers a solution.


Literate Programming for Ontologies

The reality is, I think, that literate programming has never really take off; there are a large number of reasons for this of course. Code does not naturally have an linear narrative and is not necessarily read in this way: rather, when read by an experienced programmer, they often track the flow of execution through the code [1]. A secondary problem is apparently quite trivial but the editing environment for literate programmes tends to be poor. I cannot find any good research on this, but this is both my experience and that of others [2].

For ontology development, I think a literate approach seems to make more sense. Again, in my experience, ontologies do have a somewhat more narrative approach than code — at least in the sense that the lack loops and the like.


Initial Approaches

I have now been experimenting with literate techniques since 2009. The first version used a single latex file, and pulled these out into a Manchester syntax file [3]. This worked quite nicely but suffered from the poor editor problem: I was building ontologies embedded in LaTeX, so lacked even the basic features (such as syntax highlighting) that I got when editing Manchester syntax files directly. This was a problem even with the very limited feature set from tools like omn-mode.el. The disadvantage would have been worse if I had been used to a richer environment for Manchester syntax.

My second attempt was took the opposite approach; now I used two files — a Manchester syntax file and a LaTeX one with a method for referring between the two [4]. This worked okay but had a poor implementation which I later refined [5].

These approaches have their advantages but do both suffer from a poor editing environment; either in having two files to switch and link between, or favouring documentation over ontology or vice versa. They also suffered from a secondary issue, which is that they are based around Manchester syntax. While this is nice enough, since writing Tawny-OWL [6], this style of ontology development just feels not rich enough.


Marginalia

One of the declared advantages of using a real programming language as the basic for Tawny-OWL was the ability to use the tools from that language; I have used a number of these both within Tawny-OWL and with ontologies written with Tawny: mostly obviously, the test environment, but also serialisation, properties support and, of course, the entire editing environment.

This raises the question as to whether I could use literate programming tools from Clojure as well. To my knowledge, the only real option in town here is Marginalia. Marginalia uses markdown as the documentation format and builds a nice presentation with code on one side, and comments on the other.

However, it has problems. Firstly, it presents all comments as text — you cannot comment the comments as it were which is irritating for boilerplate such as licence text. Secondly, the side-by-side presentation breaks the flow of reading as you have to move your eyes around the screen all the time. And, finally, it’s Markdown. While Markdown is nice at what it does, it’s very limited, and I missed the extra power of something like LaTeX.

The main difficulty, though, remains the editing environment. Without special support, while editing the comments show up as just comments. I can never remember the order of brackets in Markdown links — I rely on syntax highlighting to tell me that I have it correct.

Is there a way that Clojure and LaTeX can be made to work together?


LaTeX experiments: line comments

My first thought, in experimenting with LaTeX was a remarkably cheap and cheerful one. Consider a document such as this:

;; \documentclass{article}
;; \begin{document}
;; \begin{code}
(println "hello world")
;; \end{code}
;; \end{document}

This is a valid Clojure file, and is nearly valid latex as well. The only illegal part is that ;; occurs before the documentclass macro, although, in practice having ;; appear randomly throughout the document would not be ideal either.

Now, LaTeX as an embedded markup language has a very plastic syntax, and I have used that in this case. It is actually very easy to just ignore the ; character entirely, through the use of Catcodes; we can put this into a driver file which then inputs our Clojure file like so:

\catcode`;=9
\input{file.clj}

This way we maintain the validity of our Clojure file (otherwise the first line would be illegal). This is a remarkably cheap and cheerful way of achieving our aims; albeit at the cost of losing the ability to use semi-colons in our writing.


Indirect-buffers

What, however, about the editing environment. My own preferred environment — Emacs — has nice modes which edit both LaTeX and Clojure code, and it is possible to switch between the two, when I want to move between editing code and editing documentation. This is quite clunky, but there is a second option which is “indirect-buffers”. This is a piece of Emacs arcana where two buffers share some of the same data structures but not all, which means that they can have different modes. Unfortunately, my experience is that the buffers share too much — as well as the text, they also share “text-properties” which unfortunately both LaTeX and Clojure mode use. In practice, this means syntax highlighting fails (or rather than two representations fight with each other). As a second problem, although the file is valid LaTeX it is not normal LaTeX; simply things like wrapping text in paragraphs fails because of the ;; comments at the beginning of each line.

So, this experiment fails the editing environment test.


LaTeX experiments: block comments

My next attempt was to use block comments. Consider this file which is valid lisp using #| and |# block comments.

#|
\documentclass{article}
\begin{document}
\begin{code}
|#
(println "hello world")
#|
\end{code}
\end{document}
|#

We can use a similar (but not identical) trick with catcodes to make this valid latex also:

\catcode`#=\active
\def#{\catcode`#=6}
\catcode`|=\active
\def|{\catcode`|=12}
\input{hello_world.lisp}

The first call makes the # character active — that is, definable as a macro. We then define # as a macro which will set the catcode of # to 6 (which is it’s default). Then, we do the same with |. The practical upshot of this is that the opening #| does nothing other than reset everything in the driver file; effectively it’s ignored.

This actually works quite nicely in the editing environment; the opening #| effectively makes no difference to Emacs, and the mode works well. The only real disadvantage is that every code block needs two delimiters — one to open the code block in latex, and end the comment in Lisp.

Now there are various multi-mode tools around for Emacs which should help solve the otherwise clunky editing environment, although even here I am not convinced that this is the right route. Multi-mode tools are complex and to some extent are not what I want — when editing code I want to suppress the documentation, give it a low visual immediacy, and when editing documentation, I want the reverse.

There is, however, a bigger problem — while the last example is valid Common Lisp, Clojure does not have block comments, nor does the programmer have the ability to extend the reader in this way. So, while this seems a nice solution, it depends on a specific language feature which Clojure lacks.


Emacs Experiments: formats

My next idea was to use formats. Emacs allows transformations to happen between the text that is visualised on screen, and how it is saved to file. The main reason for this is to support the many non-ascii text formats that exist. But it is (perhaps unsurprisingly) fully extensible within Emacs and could be used for any purpose. So, why not convert line-commented Clojure on file into block-comments on screen; this will give editable latex on screen and valid Clojure on file and a driver file to give valid Latex on file also.

Unfortunately, it fails. While Emacs latex support is file based, Clojure (and specifically cider) has a tighter integration; it can communicate the contents of a buffer without saving to file. This circumvents the formatting — the block comments are sent to Clojure process which complains.


Emacs Experiments: linked-buffer

I am now experimenting with another option. indirect-buffers place exactly the same text (and text-properties) into two buffers. Instead of sharing all the text, why not have two buffers with a function that can transform the text bi-directionally between the two. The practical result is two views over the same content. Surprisingly, this works pretty well, as you can see here even though my current implementation is very simple — the whole buffer is copied every keypress. We could achieve the same thing with indirect-buffers, but as well as simple copying, however, we can also transform the text on the fly so that both buffers are valid for their respective modes.

The broad idea is not that new — it’s similar to web/weave or SWeave for instance, except that it is embedded into the editor; this means we can take advantage of existing support for both languages from the editor and the author gets immediate feedback about the transformation — so messing up the syntax is pretty obvious.

It also provides a superset of functionality provided by other techniques: indirect-buffers as mentioned previously, shadowfile.el (which creates a second copy of a file somewhere else on every save), and it could also mimic shadow.el which generates a secondary file by a command invocation on every save (although an invocation of an external command every keypress would probably not be performant).

The first release of linked-buffer was a month ago. I am currently unhappy with the configuration and will change this so code is in flux at the moment, but I am using it in anger which is a good sign. Currently, it does a latex <-> clojure transformation, but I will add a few more as time goes on.


Discussion

It has taken me quite a while to get to this stage, and a number of experiments along the way, but my feeling is that I now have a workable literate environment. It also validates my decision to build Tawny [7]. Having a rich textual language for building ontologies is a bit of a game changer; providing programmatic extensions to the language has been helpful, but the access to other tools, git, travis, tests and a repl has really made the difference. Now adding a literate environment to this as well changes the way that I can use ontologies and is a paradigm shift in their development.

References

  1. M. Hansen, "Modeling How Programmers Read Code", synesthesiam, 2015. http://synesthesiam.com/posts/modeling-how-programmers-read-code.html
  2. M. Giuca, "Literate programming is a terrible idea", Unspecified Behaviour, 2010. http://unspecified.wordpress.com/2010/06/04/literate-programming-is-a-terrible-idea
  3. P. Lord, "Literate OMN", An Exercise in Irrelevance, 2009. http://www.russet.org.uk/blog/1213
  4. P. Lord, "Introducing omnsplit", An Exercise in Irrelevance, 2009. http://www.russet.org.uk/blog/1258
  5. P. Lord, "Introducing Omnencap", An Exercise in Irrelevance, 2009. http://www.russet.org.uk/blog/1269
  6. P. Lord, "The Semantic Web takes Wing: Programming Ontologies with Tawny-OWL", arXiv, 2013. http://arxiv.org/abs/1303.0213
  7. P. Lord, "Tawny 1.0", An Exercise in Irrelevance, 2013. http://www.russet.org.uk/blog/2962