Archive for the ‘All’ Category

Like many developers I often edit both code and documentation describing that code. There are many systems for doing this; these can be split into code-centric systems like Javadoc which allow commenting of code, or document-centric systems like Markdown which allow interspersing code in documentation. Both of these have the fundamental problem that by focusing on one task, they offer a poor environment for the other.

In this article, I introduce my solution which I call lenticular text. In essence, this provides two syntactic views over the same piece of text. One view is code-centric, one document-centric but neither has primacy over the other, and the author can edit either as they choose, even viewing both at the same time. I have now implemented a useful version of lenticular text for Emacs, but the idea is not specific to a particular editor and should work in general. Lenticular text provides a new approach to editing text and code simultaneously.


Quick Links

  • The lentic package which implements lenticular text
  • A screen cast showing the lenticular source of lentic.el
  • A screen cast showing the use ot lentic.el in a literate workflow

Lentic.el is available on MELPA(-stable) and Marmalade.


The need for Literate Programming

The idea of mixed documentation and code goes back a long way; probably the best known proponent has been Donald Knuth in the form of literate programming. The idea is that a single source document is written containing both the code and the documentation source; these two forms are then tangled to produce the real source code that is then compiled into a runnable application or readable documentation.

The key point from literate programming is the lack of primacy between either the documentation or the code; the two are both important, and can be viewed one besides the other. Tools like Javadoc are sometimes seen as examples of literate programming, but this is not really true. Javadoc is really nothing more than an docstring from Lisp (or perl or python), but using comments instead of strings. While Javadoc is a lovely tool (and one of the nicest parts of Java in my opinion) it is a code-centric view. It is possible to write long explanations in Javadoc, but people rarely do.

There are some good examples of projects using literate programming in the wild, including mathematical software Axiom or even a book on graphics. And there are a few languages which explicitly support it: TeX and LaTeX perhaps unsurprisingly do, and Haskell has two syntaxes one which explicitly marks comments (like most languages) and the other which explicity marks code.

These examples though are really the exception that proves the rule. In reality, literate programming has never really taken off. My belief is that one reason for this is the editing environments; it is this that I consider next.


Editing in Modes

When non-programmers hear about a text-editor they normally think of limited tools like Notepad; an application which does essentially nothing other than record what you type. For the programmer, though, a text-editor is a far cry from this. Whether the programmer prefers a small and sleek tool like Vim, the Swiss-Army knife that is Emacs, or a specialized IDE like Eclipse, these tools do a lot for their users.

The key feature of all modern editors is that they are syntax-aware; while they offer some general functions for just changing text, mostly they understand at least some part of the structure of that text. At a minimum, this allows syntax highlighting (I guess there a still a few programmers who claim that colours just confuse them, but I haven’t met one for years). In general, though, it also allows intelligent indentation, linting, error-checking, compiling, interaction with a REPL, debugging and so on.

Of course, this would result in massive complexity if all of these functions where available all of the time. So, to control this, most text editors are modal. Only the tools relevant to the current syntax are available at any time; so, tools for code are present when editing code, tools for documentation when editing documentation.

Now, this presents a problem for literate programming, because it is not clear which syntax we should support when these syntaxes are mixed. In general, most editors only deal with mixed syntax by ignoring one syntax, or at most treating it conservatively. So, for example, AucTeX (the Emacs mode for LaTeX) supports embedded code snippets: in this case, the code is not syntax highlighted (i.e the syntax is ignored) and indentation function preserves any existing indentation (i.e. AucTeX does not indent code correctly, but at least does not break existing indentation). We could expand our editor to support both syntaxes at once. And there are examples of this. For instance, AucTeX (the TeX editor) allows both the code and the documentation to be edited with its DocTeX support — this is slightly cheating though, as both the code and the documentation are in the same language. Or, alternatively, Emacs org-mode will syntax highlight code snippets, and can extract them from the document-centric view, to edit them and then drop them back again.

The problem here is that supporting both syntaxes at once is difficult to do, particulary in a text editor which must also deal with partly written syntax, and may not always follow a grammar.


Literate Workflows

As well as the editor, most of the tools of the programmer are syntax aware also. Something has to read the code, compile the documentation and display the results. For a literate document, there are two approaches to dealing with this. The first is the original approach, which is to use a tool which splits the combined source form into the source for the code and the documentation. This has the most flexibility but it causes problems: if your code fails to compile, the line numbers in any errors come out in the wrong place. And source-level debuggers will generally work on the generated source code, rather than the real source.

The second approach is to use a single source form that all the tools can interpret. This works with Haskell for instance: code is explicitly marked up, and the Haskell compiler ignores the rest as comments. This is fine for Haskell, of course, but not all languages do support this. In my case, I wanted a literate form of Clojure and, while I tried hard, it is just not possible to add in any sensible way (http://www.russet.org.uk/blog/2979).

A final approach is to embed documentation in comments; Javadoc takes this approach. Or for a freer form of documentation, there are tools like Marginalia. As a general approach it seems fine at first sight, although I had some specific issues with Marginalia (chiefly, that it is markdown specific and is relatively poor environment for writing documentation). But in use, it leads right back to the problem of a modal editing: Emacs’ Clojure mode does not support it so, for example, refilling a marked-up list ignores the markup and the list items get forced into a single paragraph.


Lenticular Text

My new approach is to use Lenticular text. This is named after lenticular printing, which produces images that change depending on the angle at which they are viewed. It combines approaches from all three of these literate workflows. We take a single piece of semantics and give it two different syntactic views. Consider the “Hello World” literate code snippet written in LaTeX and Clojure (missing the LaTeX pre-amble for simplicity).

This prints "hello world".
\begin{code}
(println "hello world")
\end{code}

Now, if this were Haskell, we could stop, as code environment is one of those that the compiler recognises. But Clojure does not; we cannot load this file into Clojure validly. And as it is not real Clojure, any syntax aware editor is unlikely to do sensible things. We could instead use a code-centric view instead.

;; This prints "hello world".
;; \begin{code}
(println "hello world")
;; \end{code}

This is now valid clojure but now LaTeX breaks. Actually, with a little provocation, it can be made valid LaTeX as well (http://www.russet.org.uk/blog/2979), but my toolchain does not fully understand LaTeX (nothing does other than LaTeX, I suspect), so again we are left with the editor not doing sensible things.

These two syntaxes, though, are very similar and there is a defined relationship between the two of them (comment/uncomment every line that is not inside a code environment). It is clearly possible to transform one into the other with a simple syntactic transformation. We could do this with a batch processing tool, but this would fail its purpose because one form or the other would maintain its primacy; as an author, I need to be able to interact with both. So, the conclusion is simple; we need to build the transformation tool into the editor, and we need this to be bi-directional, so I can edit either form as I choose.

This would give a totally different experience. I could edit the code-centric form when editing code-blocks, and I could edit the document-centric form, when editing comments. When in code-blocks, I would use the code-centric form and the editor should work in a code-centric mode: auto-indentation, code evaluation, completion and so on. In the comment blocks, with a document-centric view, I should be able to reflow paragraphs, add sections, or cross-reference.

But does it work in practice? The proof of the pudding is in the programming.


Lenticular Text For Emacs.

I now have a complete implementation of Lenticular text for Emacs, available as the lentic package (http://github.com/phillord/lentic). The implementation took some thinking about and was a little tricky but is not enormously complex. In total, the core of the library is about 1000 lines long, with another 1000 lines of auxilliary code for user interaction.

My first thought was to use a model-view-controller architecture, which is the classic mechanism for maintaining two or more views over the same data. Unfortunately, the core Emacs data structure for representing text (the buffer) is defined in the C core of Emacs and is not easy to replace. Indirect buffers, an Emacs feature which lentic can mimic, for instance are implemented directly in the core. Secondly, MVC does not really make sense here: with MVC, the views can only be as complex as the model, and I did not want the lentic to be (heavily) syntax aware, so that it can work with any syntax. So, we do not have much in the way of a data model beyond the a set of characters.

Lentic instead uses a change percolation model. Emacs provides hooks before and after change; lentic listens to these in one buffer and percolates to the other. My first implementation for lentic (then called linked-buffers) simply copied the entire contents of a buffer and then applies a transform (adding or removing comments for instance). This is easy to implement, but inefficient, so it scaled only to 3-400 lines of code before it became laggy. The current implementation is incremental and just percolates the parts which have changed. To add a new transformation requires implementing two functions — one which “clones” changes and one which “converts” a location in one buffer to the location in the other.

Re-implementing this for other editors would not be too hard. Many of the core algorithms could be shared — while they are not complex in the abstract, for incremental updates there are quite a few (syntax-dependent) edge cases. Similarly, I went through several iterations of the configuration options: specifying how a particular file should be transformed. This started off being declarative but ended up as it should do in a lisp: with a function.

I now have transformations between lisp (Clojure and Emacs) and latex, org-mode and asciidoc. These use much of the same logic, as they demark code blocks and comment blocks with a start of line comment. This is essentially the same syntax that Haskell’s literate programming mode provides. The most complex transformation that I have attempted is between org-mode and emacs-lisp; the complexity here comes because emacs-lisp comments have a syntax which somewhat overlaps with org-mode and I wanted them to work together rather than duplicate each other. For all of these implementations, my old netbook can cope with files 2-3000 lines long before any lag starts to become noticable. Lentic is fast enough for normal, every day usage.


Experience

The experience of implementing lentic has been an interesting one. In use, the system works pretty much as I hoped it would be. As an author I can switch freely between document and code-centric views and all of the commands work as I would expect. When I have a big screen, I can view both document and code-centric views at the same time, side-by-side, “switching” only my eyes. I can now evaluate and unit-test the code in my Tawny-OWL (http://www.russet.org.uk/blog/3030) manual.

By building the transform into the editor, I can get immediate feedback on both parts of my literate code. And, my lenticular text is saved to file in both forms. As a result of this, no other tools have to change. Clojure gets a genuine Clojure file. LaTeX gets a genuine LaTeX file. And for the author, the two forms are (nearly) equivalent. We only care which is the “real” file at two points: when starting the two lentic views, and when versioning as we only want to commit one form. The decision as to which form is “primary” becomes a minor issue. The lentic source code, for example, is stored as an Emacs-Lisp file, which can transform into an Org-mode file, so that it can work with MELPA (as well as for bootstrap reasons). The manual I am working on is versioned in the LaTeX form but transforms into Clojure so that people cloning the repository will get the document first; I use Emacs in batch to generate Clojure during unit testing.

There are some issues remaining. Implementation wise, undo does not behave quite as I want, since it is sensitive to switching views. And, from a usability point-of-view, I find that I sometimes try capabilities from the code-centric view in the document-centric or vice versa.

Despite these issues, in practice, the experience is pretty much what I hoped; I generally work in one form of the other, but often switch rapidly backward and forward between the two while checking. I find it to be a rich and natural form of interaction. I think others will also.


Conclusions

I doubt that tooling is the only difficulty with literate programming, but the lack of a rich editing environment is certainly one of them. Lenticular text addresses this problem. So far I have used lenticular text to document the source code of lentic.el itself, and to edit the source of a manual. For myself, I will now turn to what it was intended for in the first place: literate ontologies allowing me to produce a rich description of a domain both in a human and computational language.

The notion of lenticular text is more general though, and the implementation is not that hard. It can be used in any system where a mixed syntax is required. This is a common problem for programmers; lenticular text offers a usable solution.

Bibliography

Less than one month after the release of Tawny-OWL 1.2.0 (http://www.russet.org.uk/blog/3018), I am pleased to announce the 1.3.0 release. This is a much smaller release than 1.2.0, but provides two useful changes.

First, I have now added support for axiom annotations as more extensively documented in my past post (http://www.russet.org.uk/blog/3028). While I expect these are still a minority requirement, they are used heavily by some people, and so they need supporting.

Second, I have reworked the support for patterns. This has been through the addition of three functions. p allows writing classes with optionality. So, for instance, consider this code:

(p o/owl-class
   o partition-name
   :comment comment
   :super super)

comment and super are optional here; they can be nil. In this case, the p function removes the nil value from the owl-class call. More over if, as in this case, the entire frame is has only nil values, it will be removed altogether. p returns the results of this call as a clojure record contain the entity itself and the name that was used to create that entity. In a nice piece of serendipity, the support that I have added for annotations (http://www.russet.org.uk/blog/3028), also allows direct use of this record latter in the pattern. So, for example, this form uses partition with is the return value from above.

(map
 #(p o/owl-class o
     %
     :comment comment
     :super partition)
   values)

The reason for this record, though, is that it makes it relatively easy to build patterns in both function and macro form. Macros are never trivial, but all this one does is turn a set of symbols into their string equivalents.

(defmacro defpartition
  "As value-partition but accepts symbols instead of string and
takes the ontology as a frame rather than first argument."
  [partition-name partition-values & options]
  (tawny.pattern/pattern-generator
   'tawny.pattern/value-partition
   (list* (name partition-name)
          `(tawny.util/quote-word ~@partition-values)
          options)))

The practical upshot of this is that I can define a value partition using a macro like this:

(defpartition Hydrophobicity
  [Hydrophobic Hydrophillic]
  :comment "Part of the Hydrophobicity value partition"
  :super PhysicoChemicalProperty
  :domain AminoAcid)

As well as generating the OWL API objects, this also binds the relevant vars, both those visible here (Hydrophobicity) and those generated (hasHydrophobicity).

Take together these are both important additions to Tawny-OWL. We can provide better provenance with the axiom annotations, and with patterns we can lift the level of abstraction at which we build our ontology which is one of the original motivations for Tawny-OWL in the first place.

Tawny-OWL 1.3.0 is now available on Clojars.

Bibliography

Since the early development of Tawny-OWL and easy to use syntax has been a specific objective (http://www.russet.org.uk/blog/2214), as well as hiding some of the complexity of the OWL API. The intension has always been for Tawny-OWL to be an ontology developer tool first and a programmatic library second and keeping this in mind has been part of the reason that I believe does fill these objectives.

Unfortunately, the other part of the reason is that Tawny-OWL does hides functionality that is available in the OWL API. Or, more strictly, does not uncover it. Tawny-OWL is implemented in Clojure and what is possible in Java is also possible in Java.

One of the key decisions was to hide the support that the OWL API provides for certain forms of annotation, in particular annotatons on axioms. Of course, Tawny-OWL allows you to add annotations to entities. This is used to enable labels and comments on any entity. But axiom annotations allow the description of the relationships between entities. So, for example, as well as attaching comments on two classes, it is also possible to attach a comment on the sub/superclass relatinship between the two.

The main reason that Tawny-OWL did not support these natively is that it takes an entity-centric view of OWL. So, if we consider this statement:

(defclass A
   :super B
   :label "A")

We are describing the entity A primarily. In fact, this statement translates into two axioms, which we can see in the OWL/XML representation which looks like this:

<SubClassOf>
    <Class IRI="#A"/>
    <Class IRI="#B"/>
</SubClassOf>
<AnnotationAssertion>
    <AnnotationProperty abbreviatedIRI="rdfs:label"/>
    <IRI>#A</IRI>
    <Literal xml:lang="en"
       datatypeIRI="http://www.w3.org/1999/02/22-rdf-syntax-ns#PlainLiteral">A</Literal>
</AnnotationAssertion>

The defclass statement above returns the entity (actually, the var as it is a def form, but the var contains the entity), rather than the axioms. If I wished to return the axioms, as there are several, I would need a list, or more probably, a data structure so that I could extract the axiom I wanted. This would, however, complicate life considerably. For instance, B would now refer to this data structure, which would need unpicking for its use here. Worse, the OWL API works by mutation, so the axioms in B might now reflect only some of the axioms refering to B.

Of course, there is a way around this, which is to dip down into the OWL API, fetching the axioms this way. As far as I can tell, annotations need to be added at the time the axiom is created (it is probably possible to do it later as well). This example comes from my recasting of the OWL Primer ontology.

(add-axiom
 (.getOWLSubClassOfAxiom (owl-data-factory)
  Man Person #{(owl-comment "States that every man is a person")}))

This works well, but the syntax is not nice, we need to do a direct call to the OWL API. We are not even using the add-subclass function. This did not bother my overly, as it was not something that I thought would be needed often.

Unfortunately, it is something that the Gene Ontology people do often, including, for instance, annotating labels with the source of knowledge for these labels. If I am to support them, I need an attractive syntax that fits with current Tawny-OWL syntax. After a couple of attempts, I decided on this:

(defclass A
  :super (annotate B
           (owl-comment "A is a kind of B")))

The axioms in Tawny-OWL are syntactically implicit, describing the :super relationship between A, so I cannot directly address these. But attaching an annotation to B in this way is unambiguous. Compare these two statements that it might otherwise be mistaken for; in one case, we annotate A with a comment (which is most common thing to do) or, we inline an annotation of B (which would probably be better not inline!).

(defclass A
  :super B
  :annotation (owl-comment "A is an interesting entity"))

(defclass A
  :super (owl-class B
            :annotation
               (owl-comment "B is an interesting entity")))

This also extends naturally to other axioms, including annotation labels.

(defclass A
  :annotation (annotate (label "A")
                 (owl-comment "According to me")))

The implementation of this took me several attempts, including some fairly painful and ultimately unsuccesful macros. In the end, I found a much simpler solution. annotate returns a clojure record which contains both the entity — (label "A") or B in these examples — and the annotation always an owl-comment here, but potentially anything. This record is passed through the Tawny-OWL function call stacks in place of the raw entity, until the appropriate axiom is created. I then unpick this object with two calls to protocol methods — as-entity and as-annotations like so.

(.getOWLAnnotationAssertionAxiom
    (owl-data-factory)
    (as-iri named-entity)
    ^OWLAnnotation (as-entity annotation)
    (as-annotations annotation)

The protocol implementations are trivial.

(defrecord Annotated [entity annotations]
  Entityable
  (as-entity [this] entity)
  Annotatable
  (as-annotations [this] annotations))

These allow me to avoid checking for an Annotated object when creating my axiom. In most cases, I will not have one of these, but a normal OWLObject. Or a Long, String or even a Keyword for property characteristics. So I extend the protocols to cover these cases also, with even more trivial implementations.

(extend-type
    Object
  Entityable
  (as-entity [entity] entity))

(extend-type
    Object
  Annotatable
  (as-annotations [entity]
    #{}))

Finally, the annotate function broadcasts as do many other functions in Tawny-OWL, so it is possible to annotate several axioms at once. So, for example, here explicitly using a list.

(defclass A
   :super (annotate [B C D]
             (owl-comment "All of these are supers")))

Or, implicitly with a function existential restriction that itself uses broadcasting.

(defclass A
   :super (annotate (owl-some r B C D)
             (owl-comment "All of these are existentials")))

While the syntax is slightly more complex than most of Tawny-OWL, it is a considerable improvement on dropping down to the OWL API layer beneath; and, ultimately, this form of annotation is a more complex usage of OWL.

Bibliography

In this post, I describe the different technologies that I’ve tried for writing a new book, from markdown to LaTeX.

One of the questions that I have been asked about Tawny-OWL (http://www.russet.org.uk/blog/2366) was whether there was a manual or not. Actually, the answer is, yes, there is and has been since well before the 1.0 release.

However, it’s not rich enough. It makes requires both a reasonably good background in Clojure and in Ontology development. Given that the overlaps between these two areas of knowledge is probably limited to myself and Jennifer Warrendar this is rather less than ideal. So, I’ve started a new manual in the form of a book called Take-Wing. Getting a nice environment has been an interesting journey.


Markdown

The orginal version of the documentation was written in Markdown. There was not really a good reason for this, other than I wanted to try it and it is well supported. It’s not a bad format and is easy to type, although it lacks in extensibility (which is, of course, why it has been extended in so many different ways!).

However, one significant problem was that I have no good way of checking that the code samples in it work. I had been hit by this when I changed function names, for example, from owlclass to owl-class, or more perniciously when I moved one entity from a variable to a function. My general solution to this linked-buffer which translates documentation into code and vice-versa. The lack of an explicit delimiter for the start and end of code blocks makes this harder to implement for markdown.


Asciidoc

My next thought was to move to asciidoc. I like asciidoc and have used it for years. I helped to add slidy support to it, so that I can use it for slides. Most of my teaching material uses asciidoc now, because I can get slides and lecture notes from the same source, with easy to integrate code snippets that I can run, albeit maintained in independent source files which can make things a little painful.

The first version of Take-Wing used asciidoc. Because I wanted to use linked-buffer to generate Clojure source that I could test, I needed to use multiple files with a master file — the Clojure cookbook uses much the same system, albeit for a slightly different reason.

However, writing a talk or even a lecture series is a very different thing from writing a book. The main problem was not asciidoc itself, but the support from Emacs. There is an asciidoc mode but, aside from needing patching, it is heavily focused on syntax highlighting rather than document structure. So, inserting cross-references and the link were painful.


Org-Mode

My next idea was to try org-mode. So, I used pandoc to do the translation; this wasn’t a glowing success, but it achieved the basics anyway.

And Emacs specific solution doesn’t seem ideal, but then for Take-Wing I am likely to be the main author. And I have been using org for a while, particularly for building plans both for Tawny-OWL and also, ironically, for the outline for Take-Wing.

It has clear code demarcation blocks, so adding support for linked-buffers was straight-forward enough. One significant problem is that the org-mode export framework has recently been replaced, so I have to install my own version; actually, installing a new version is not problematic as this can be done with package.el, but because org-mode is in core emacs, deleting the old version is doesn’t happen automatically. I found several version conflicts resulting from loading old files. Still, I managed to get org working, producing both a PDF and nice web page.

Org-mode was pleasant enough to use. The folding features were nice and the syntax is reasonable for typing. However, I found the same problems as with asciidoc. While, org has advanced features for linking to the rest of the world, linking within documents is not so good, especially when using a master document. Basically, I couldn’t get over the envy of my PhD students both of who have written their thesis with AucTex


LaTeX

LaTeX itself is an old piece of software, but it does the job of document preparation like nothing else. And, the support in Emacs is fantastic. However, it’s achilles heel has been that is produces PDFs. I don’t like PDFs for a variety of reasons, but the main one is that, essentially, it’s a screenshot of a piece of paper. I’ve long gone natively digital: I write my papers, blog posts, emails and everything else on computer, and I am used to reading in this way as well. And I want the web for reading documentation.

In my time, I’ve tried lots of different technology for webifying LaTeX. The best I had found was Plastex; my post on LaTeXtowordpress (http://www.russet.org.uk/blog/1740) is still one of my best read, and my colleague even used it to post her PhD thesis (http://themindwobbles.wordpress.com/2012/06/14/converting-a-latex-thesis-to-multiple-wordpress-posts/). It’s a good tool, but it’s not using TeX underneath and so it doesn’t always do the right thing as a result.

For some reason, I managed to have missed tex4ht, though, even though it has been around for quite a while. My initial impressions are, though, that it is probably the best tool that I have seen even if it has been blessed with some of the worst documentation and a very strange command line. I’m hopefully that make4ht may help this. It’s author has been fantastic on StackExchange in answering my questions. So, I decided to give it a go, so I exported by org to latex, threw away my org and started again.

The first problem that I didn’t like it’s handling of the listings package. I have fixed this by dropping code straight through to HTML. Initially, I tried highlight to do syntax highlighting, but then dropped this for prism, as the former cannot do inline highlighting also. The default footnote handling is also very strange, but this is also easy to fix with a standard package. So, now, after a lot of fiddling, it all seems to be working.


Conclusions

It’s been quite a lot of effort and rather more fiddling than I would have hoped, but I now have an editing environment that I like, and this is important. Writing a document is hard, and the environment needs to support this and not get in the way.

I am hopeful that tex4ht will fulfil it’s initial promise. I have been using LaTeX less and less over the years, simply because of its lack of HTML support, despite the fact that it’s the best environment for writing in: so when I have been most happy as an author, I have been least happy as a reader.

All I have to do now is write the rest of the book; so far, I think it’s going well, and this is something that I shall cover in later blog posts.

Bibliography

About six months after the release of Tawny-OWL 1.1.0 (http://www.russet.org.uk/blog/2989), I am pleased to announce the 1.2 release. There have been a number of changes in this release, some planned, some not.

The planned changes have been an extension of the rendering capabilities of Tawny-OWL; it is now possible to render an OWLEntity entirely into Clojure data structures. The main reason for this was to allow use of core.logic for building an effective syntactic query language over OWL. This is now complete, although more work needs to be done on tawny.query if this is to be exploited to it’s fullest.

The main unplanned release was caused by a change in the OWL API which had dramatic (and negative) consequences for Tawny-OWL performance. The details of this change have been documented elsewhere (http://www.russet.org.uk/blog/3007). Fixing this actually resulted in small increases in rendering performance. This also lead me to some aggresive micro-optimisations of the Tawny-OWL code base, mostly meaning I can avoid using variadic function calls where necessary (these box arguments into a list, which can be quite slow). On small ontologies, Tawny-OWL will load several times faster now.

The pace of development on Tawny-OWL has slowed somewhat in recent times; partly, this is because it is now reaching a point of maturity. However, I have also been diverted by two other projects. I have put a reasonable amount of effort into Linked buffer which provides a rich literate environment, originally intended for Tawny-OWL, although it is not specific to it. And, secondly, I have started a manual for Tawny-OWL called Take Wing. This is developed using Linked Buffer. Not the first example of a literate program as a manual, I know, but probably the first example a literate ontology as a manual.

Despite the slowed pace, I have a roadmap for Tawny 1.3. I wish to support what I am calling transactional restrictions; these are a set of axioms which are either added all together or not at all. The use case for this comes from the Gene Ontology who use annotations on their annotations to describe evidence for particular statements. The meta annotations should not be added to the ontology, until the annotations are.

And secondly, I am reworking tawny.pattern — the existing value partition support has already been changed, and there are some new functions which make it easy to turn a functional pattern into a macro one; the latter produce the more attractive syntax.

No doubt there will be a few unplanned additions also!

Bibliography

Yesterday, I came across a very strange bug in my Tawny-OWL library (1303.0213). The bug was my fault, but fixing it was exacerbated by what has to go into my list of Clojure gotchas (http://www.russet.org.uk/blog/2991).

Consider this very simple piece of Clojure code. Is this a recursive function, i.e. one that calls itself? Intuitively the answer is yes, but I would argue that actually, you cannot tell from the information given. It could be, but is not necessarily so.

(defn countdown [n]
  (if (pos? n)
    (cons n
          (countdown (- n 1)))
    '(LiftOff)))

Let’s run this code:

test-project.core> (countdown 10)
(10 9 8 7 6 5 4 3 2 1 LiftOff)

All is working as it should and the function is recursing, so there appears to be no problem. However, countdown is a big long to type, so lets create an alias called cdown and test that this works as expected.

test-project.core> (def cdown countdown)
#'test-project.core/cdown
test-project.core> (cdown 10)
(10 9 8 7 6 5 4 3 2 1 LiftOff)

All is good, cdown is doing just what we expect also, and it’s the same as countdown. However, as we as being lazy, I am easily bored, so now we make the countdown go faster by redefining the countdown function.

(defn countdown [n]
  (if (pos? n)
    (cons n
          (countdown (- n 2)))
    '(LiftOff)))

test-project.core> (countdown 10)
(10 8 6 4 2 LiftOff)

Again, everything is working. But what happens when we call cdown

(cdown 10)

If you are new to clojure, you might assume that as we have changed the definition of countdown the definition of cdown would change as well, so we should get

test-project.core> (cdown 10)
(10 8 6 4 2 LiftOff)

If you are more experienced with Clojure (as I would claim myself to now be, although others may disagree), then you would not that the alias was to the function and not the var, so cdown now contains a pointer to the old definition, so it will behave as it did before.

test-project.core> (cdown 10)
(10 9 8 7 6 5 4 3 2 1 LiftOff)

As it happens, this is not true either. What we actually get it this:

test-project.core> (cdown 10)
(10 9 7 5 3 1 LiftOff)

A wierd combination of the two functions. What is happening is this: the initial function call to cdown calls our first function definition; 10 is postive, so we subtract 1 to get 9. And, now, here is the strange bit. The function is no longer recursive, because it calls countdown through the indirection of the clojure var; we could be more explicit and replace the “recursive” call with this instead which is what is actually happening in the compiled Java class.

((var-get (var countdown)) 10)

So, in short, in Clojure a recursive function is not necessarily recursive, even if it is when defined. If the var changes, then the function no longer calls itself.

There are solutions to this. I should have defined my cdown alias like so:

(def cdown #'countdown)

Now, when we defined countdown the original function gets lost and GCd rather than hanging around, a pale shadow of its former recursive self. This works but is rather counterintuitive; with a Lisp-1 you get used to refering to functions directly rather than with a quoted symbol, but here we have to use both a quote and a #.

A second solution is to use recur, which does not use var indirection. So consider this example. The first countdown2 function is recursive and will always be since it directly refers to itself, and this remains after redefinition.

(defn countdown2 [i]
  (when (pos? i)
    (println i)
    (recur (dec i))))

(def cdown2 countdown2)

(defn countdown2 [i]
  (when (pos? i)
    (println i)
    (recur (- i 2))))

;; prints 10, 9, 8...
(cdown2 10)

Of course, we cannot use this in the first example, because it is not tail recursive. And, if we want to be really picky, we could argue that countdown2 is not recursive either because it is tail-call optimised, so it doesn’t call itself. And this is not an implementation detail either because it’s the whole point of recur.

Another solution would be for Clojure to provide a recur* form (or something equivalent) which would work away from the tail position and which directly recursed, rather than using var indirection. It’s unlikely that Clojure will do this, however, because this form of recursion consumes stack and so is not a good thing to do.

It would also be possible to change the semantics of Clojure symbols so they automatically refered to vars rather than values as appropriate. So in all these cases, we would have vars. As far as I can tell at the moment, the same syntax y is refering to two different things — the value of the var y in the first two forms and the var itself in the third.

(def x y)
(def z {:x y})

(defn y[]
  (y))

To replicate the current behaviour, we would just use var-get.

(def x (var-get y))

Without thinking about this too deeply, I cannot actually see a downside to this, except that it would involve more var indirection, so would probably be slower.

Finally, clojure could just remove var indirection. Actually, this is likely to happen in the form of build profiles, because it also has the advantage of making Clojure faster. As my example shows, however, it also introduces some subtle changes to the semantics of the language.

I still like Clojure. The more you use a language, I guess, the more oddities you find.

Bibliography