Archive for the ‘Emacs’ Category

I have just released m-buffer version 0.10. A pretty minor point release, but I thought I would post about it, as I have not done so far.

Interacting with the text in an Emacs buffer can be a little bit painful at times. The main issue is that Emacs makes heavy use of global state its functions, something which has just rather gone out of fashion since Emacs was designed, and probably for good reason. I know that I am not the only to find this painful (http://www.wilfred.me.uk/blog/2013/03/31/essential-elisp-libraries). Perhaps, most obvious, is that many functions work on the current-buffer buffer, or at point. As a result, Emacs code tends to have lots of save-excursion and with-current-buffer calls sprinkled throughout.

We see the same thing in the regexp matching code. So for instance, the Emacs Lisp manual suggests the following:

(while (re-search-forward "foo[ \t]+bar" nil t)
   (replace-match "foobar"))

to replace all matches in a buffer. As you should be able to see, the call to replace-match contains no reference to what match is to be replaced. This is passed through the use of the global match data. Again, this made managable through the use of macros to save and restore the global state, in this case save-match-data. So, you end up with:

(save-match-data
  (save-excursion
    (while (re-search-forward "foo[ \t]+bar" nil t)
      (replace-match "foobar"))))

If you want to see this taken to extreme, have a look at the definition of m-buffer-match which is a good demonstration of the problem to which it is a solution.

The use of while is also less than ideal. It may not be obvious, but the code above contains a potential non-termination. For instance, the following should print out the position of every end-of-line in the buffer. Try it in an Emacs you don’t want.

(while (re-search-forward "$" nil t)
  (message "point at: %s" (point)))

It fails (and hangs) because the regexp $ is zero-width in the length. So, the re-search-forward moves point to just in front of the first new line, reports point, and then never moves again. This problem is a real one, and happens in real code.

m-buffer.el is my solution. Inspired partially by dash.el it provides (nearly) stateless, list orientated operations of buffer contents. So:

(m-buffer-match (current-buffer) "buffer")

returns a list of match-data to every occurrence of “buffer” in the current-buffer. There are also many, many convienience functions. So to match the end of the line, you could do:

(m-buffer-match
  (current-buffer)
  "$" :post-match 'm-buffer-post-match-forward-char)

which avoids the zero-width regexp problem, but it’s even easier to do.

(m-buffer-match-line-end (current-buffer))

Or to print to mini-buffer as before, you can instead combine with dash.el.

(--map
  (message "point at: %s" it)
 (m-buffer-match-line-end (current-buffer)))

m-buffer has been written for ease of use and not performance. So, by default, it uses markers everywhere which can potentially cause slowdowns in Emacs.I have started to add some functions for dealing with markers which need explicitly clearing explicitly; a traditional use for macros in lisp of closing existing resources. So

(m-buffer-with-markers
   ((end (m-buffer-match-line-end (current-buffer))))
 (--map
    (message "point at: %s" it)
   end))

leaves the buffer free of stray markers. The m-buffer-with-markers macro behaves like let* with cleanup.

I am now working on stateless functions for point — so m-buffer-at-point does the same as point but needs a buffer as an argument. Most of the code is trivial in this case, it just needs writing.

I am rather biased, but I like m-buffer very much. It massively simplified the development of lentic (http://www.russet.org.uk/blog/3047). Writing that with embedded while loops would have been painful in the extreme. I think I saved more time in using m-buffer than it took to write it. I think it could simplify many parts of Emacs development significantly. I hope others think so to.

m-buffer is available on MELPA and github

Bibliography

With a previous release of lentic (http://www.russet.org.uk/blog/3035), I got a couple of suggestions. One of which was a complaint that it was hard to get going, because lentic lacks documentation. This is a bit unfortunate. Lentic actually did have documentation but it is hidden away as comments in source code; although, it’s not specific to it, I wanted lentic to enable literate programming and it uses itself to document itself.

Now, this makes perfect sense, but there is a problem. The end-user form of the documentation needs generating from the source code. This is true in general in Emacs land, although the standard form is texinfo. The usual solution to this problem is to generate the documentation up during the packaging process.

This should work, but it doesn’t. Or, rather, it does not work in all cases. For an archive such as Marmalade, it is entirely possible. But for MELPA it fails. The problem is that MELPA works directly from a git checkout. My documentation, of course, is not source but generated. Now MELPA has support for the generation of info from texinfo. But my source is Emacs Lisp and I need to use Lentic to generate a readable form.

One solution would, of course, be not to use MELPA. Nic Ferrier recently argued on the emacs-devel mailing list that the idea was fundamentally broken — a package is something that the developers should generate and publish as with Java or Clojure . He makes a good point, and one that is correct. Moving to Marmalade would solve this problems; after Nic’s work it is largely stable, so this was definately an option.

However, I like MELPA (although I have only used it since stable came out). It is nice that it uses what I am doing anyway (tagging, pushing, so forth). And I like the download stats. So I talked with the MELPA folks but, entirely reasonably, they did not want to add specific support to MELPA for this. Nor support for, for example, downloading the source from somewhere else.

Other possibilities did raise themselves; I could just check in my documentation. But my documentation depends on my source, so pretty much every commit would require also require a documentation commit. Not nice. I thought about adding the documentation as an independent package. Then my documentation commits would be in a repo with nothing else; but this hassles the user, even if it auto-installs. And I’d need different packages for MELPA and Marmalade. So, I was left with no good solution.

At the same time, as all of this, I was working on the documentation, generating Org files from my lisp documentation, then converting that to info. This sort of worked, but not nicely. A significant problem was that something in the toolchain did not like having multiple sections with the same names and I have a lot of these (“Commentary, Header, Code”). I have not tracked down yet whether this is a problem with Org’s texinfo export, texinfo itself or info, nor am I sure it would be worth the effort.

Instead, I decided to try HTML output. This worked quite nicely; I use a single Org driver file (called lenticular.org) and imported all the generated org files from here. I also found org-info which I had not seen before — this is Javascript which gives an Info like experience — next, previous, occur, search and so on. It’s imperfect, but pretty usable, and gives a quite nice documentation experience. It’s also possible to view in EWW although there is no Info like paging here.

Dropping info has one other big advantage — my tool chain for generating the documentation is now entirely in Emacs. So, my source code is now enough, because lentic can generate it’s own documentation on-demand after installation. The first time the user requests the documentation either in EWW or a browser, lentic generates org files from it’s own source and then the HTML (http://homepages.cs.ncl.ac.uk/phillip.lord/lentic/lenticular.html). The only limitation is that this forces the requirement for a recent Emacs version, since the org mode exporter framework has just been updated; unfortunate but acceptable for a 0.x release.

Not all problems disappear. Because my documentation fits into the Emacs-Lisp commenting standards, it is not structured-ideally for info. For instance, the headers of all the lentic Emacs-Lisp files are included. I also would like to extend org-info so I can switch the “Code” sections on and off (embedded literate sources are useful, but not for everyone). It will need some work on Lentic, and probably also the org-mode HTML exporter.

But, then, neither is it that far off. It is good enough and a bit advance on the previous situation. Perhaps, too, it demonstrates a future for Emacs documentation in general as well, with Info replaced with HTML.

The new release (http://www.russet.org.uk/blog/3047) of Lentic is now available on Marmalade and MELPA, complete with documentation avaialble from the menu. Please feel free to try both lentic and its documentation system out now.

Bibliography

Lentic is an Emacs mode which supports multiple views over the same text. This can be used for a form of literate programming. It has specific support for Clojure which it can combine with either LaTeX, Asciidoc or Org-Mode.

Two lentic buffers, by default, the two share content but are otherwise independent. Therefore, you can have two buffers open, each showing the content in different modes; to switch modes, you simply switch buffers. The content, location of point, and view are shared.

However, lentic also allows a bi-directional transformation between lentic buffers — the buffers can have different but related text. This allows, for example, one buffer to contain an Emacs lisp file, while the other contains the same text but with “;;” comment characters removed leaving the content in org-mode, enabling a form of literate Emacs-Lisp programming with no change to either org-mode or Emacs-Lisp. Ready made transformations are also available for Clojure, latex and asciidoc.

Lentic is both configurable and extensible, using the EIEIO object system.

Lentic was previously known as Linked-Buffers.

The 0.7 release adds an integrated documentation system, support for Haskell, LaTeX literate programming and best of all, a ROT-13 transformation.

Available on MELPA-stable, MELPA and Marmalade https://github.com/phillord/lentic

Like many developers I often edit both code and documentation describing that code. There are many systems for doing this; these can be split into code-centric systems like Javadoc which allow commenting of code, or document-centric systems like Markdown which allow interspersing code in documentation. Both of these have the fundamental problem that by focusing on one task, they offer a poor environment for the other.

In this article, I introduce my solution which I call lenticular text. In essence, this provides two syntactic views over the same piece of text. One view is code-centric, one document-centric but neither has primacy over the other, and the author can edit either as they choose, even viewing both at the same time. I have now implemented a useful version of lenticular text for Emacs, but the idea is not specific to a particular editor and should work in general. Lenticular text provides a new approach to editing text and code simultaneously.


Quick Links

  • The lentic package which implements lenticular text
  • A screen cast showing the lenticular source of lentic.el
  • A screen cast showing the use ot lentic.el in a literate workflow

Lentic.el is available on MELPA(-stable) and Marmalade.


The need for Literate Programming

The idea of mixed documentation and code goes back a long way; probably the best known proponent has been Donald Knuth in the form of literate programming. The idea is that a single source document is written containing both the code and the documentation source; these two forms are then tangled to produce the real source code that is then compiled into a runnable application or readable documentation.

The key point from literate programming is the lack of primacy between either the documentation or the code; the two are both important, and can be viewed one besides the other. Tools like Javadoc are sometimes seen as examples of literate programming, but this is not really true. Javadoc is really nothing more than an docstring from Lisp (or perl or python), but using comments instead of strings. While Javadoc is a lovely tool (and one of the nicest parts of Java in my opinion) it is a code-centric view. It is possible to write long explanations in Javadoc, but people rarely do.

There are some good examples of projects using literate programming in the wild, including mathematical software Axiom or even a book on graphics. And there are a few languages which explicitly support it: TeX and LaTeX perhaps unsurprisingly do, and Haskell has two syntaxes one which explicitly marks comments (like most languages) and the other which explicity marks code.

These examples though are really the exception that proves the rule. In reality, literate programming has never really taken off. My belief is that one reason for this is the editing environments; it is this that I consider next.


Editing in Modes

When non-programmers hear about a text-editor they normally think of limited tools like Notepad; an application which does essentially nothing other than record what you type. For the programmer, though, a text-editor is a far cry from this. Whether the programmer prefers a small and sleek tool like Vim, the Swiss-Army knife that is Emacs, or a specialized IDE like Eclipse, these tools do a lot for their users.

The key feature of all modern editors is that they are syntax-aware; while they offer some general functions for just changing text, mostly they understand at least some part of the structure of that text. At a minimum, this allows syntax highlighting (I guess there a still a few programmers who claim that colours just confuse them, but I haven’t met one for years). In general, though, it also allows intelligent indentation, linting, error-checking, compiling, interaction with a REPL, debugging and so on.

Of course, this would result in massive complexity if all of these functions where available all of the time. So, to control this, most text editors are modal. Only the tools relevant to the current syntax are available at any time; so, tools for code are present when editing code, tools for documentation when editing documentation.

Now, this presents a problem for literate programming, because it is not clear which syntax we should support when these syntaxes are mixed. In general, most editors only deal with mixed syntax by ignoring one syntax, or at most treating it conservatively. So, for example, AucTeX (the Emacs mode for LaTeX) supports embedded code snippets: in this case, the code is not syntax highlighted (i.e the syntax is ignored) and indentation function preserves any existing indentation (i.e. AucTeX does not indent code correctly, but at least does not break existing indentation). We could expand our editor to support both syntaxes at once. And there are examples of this. For instance, AucTeX (the TeX editor) allows both the code and the documentation to be edited with its DocTeX support — this is slightly cheating though, as both the code and the documentation are in the same language. Or, alternatively, Emacs org-mode will syntax highlight code snippets, and can extract them from the document-centric view, to edit them and then drop them back again.

The problem here is that supporting both syntaxes at once is difficult to do, particulary in a text editor which must also deal with partly written syntax, and may not always follow a grammar.


Literate Workflows

As well as the editor, most of the tools of the programmer are syntax aware also. Something has to read the code, compile the documentation and display the results. For a literate document, there are two approaches to dealing with this. The first is the original approach, which is to use a tool which splits the combined source form into the source for the code and the documentation. This has the most flexibility but it causes problems: if your code fails to compile, the line numbers in any errors come out in the wrong place. And source-level debuggers will generally work on the generated source code, rather than the real source.

The second approach is to use a single source form that all the tools can interpret. This works with Haskell for instance: code is explicitly marked up, and the Haskell compiler ignores the rest as comments. This is fine for Haskell, of course, but not all languages do support this. In my case, I wanted a literate form of Clojure and, while I tried hard, it is just not possible to add in any sensible way (http://www.russet.org.uk/blog/2979).

A final approach is to embed documentation in comments; Javadoc takes this approach. Or for a freer form of documentation, there are tools like Marginalia. As a general approach it seems fine at first sight, although I had some specific issues with Marginalia (chiefly, that it is markdown specific and is relatively poor environment for writing documentation). But in use, it leads right back to the problem of a modal editing: Emacs’ Clojure mode does not support it so, for example, refilling a marked-up list ignores the markup and the list items get forced into a single paragraph.


Lenticular Text

My new approach is to use Lenticular text. This is named after lenticular printing, which produces images that change depending on the angle at which they are viewed. It combines approaches from all three of these literate workflows. We take a single piece of semantics and give it two different syntactic views. Consider the “Hello World” literate code snippet written in LaTeX and Clojure (missing the LaTeX pre-amble for simplicity).

This prints "hello world".
\begin{code}
(println "hello world")
\end{code}

Now, if this were Haskell, we could stop, as code environment is one of those that the compiler recognises. But Clojure does not; we cannot load this file into Clojure validly. And as it is not real Clojure, any syntax aware editor is unlikely to do sensible things. We could instead use a code-centric view instead.

;; This prints "hello world".
;; \begin{code}
(println "hello world")
;; \end{code}

This is now valid clojure but now LaTeX breaks. Actually, with a little provocation, it can be made valid LaTeX as well (http://www.russet.org.uk/blog/2979), but my toolchain does not fully understand LaTeX (nothing does other than LaTeX, I suspect), so again we are left with the editor not doing sensible things.

These two syntaxes, though, are very similar and there is a defined relationship between the two of them (comment/uncomment every line that is not inside a code environment). It is clearly possible to transform one into the other with a simple syntactic transformation. We could do this with a batch processing tool, but this would fail its purpose because one form or the other would maintain its primacy; as an author, I need to be able to interact with both. So, the conclusion is simple; we need to build the transformation tool into the editor, and we need this to be bi-directional, so I can edit either form as I choose.

This would give a totally different experience. I could edit the code-centric form when editing code-blocks, and I could edit the document-centric form, when editing comments. When in code-blocks, I would use the code-centric form and the editor should work in a code-centric mode: auto-indentation, code evaluation, completion and so on. In the comment blocks, with a document-centric view, I should be able to reflow paragraphs, add sections, or cross-reference.

But does it work in practice? The proof of the pudding is in the programming.


Lenticular Text For Emacs.

I now have a complete implementation of Lenticular text for Emacs, available as the lentic package (http://github.com/phillord/lentic). The implementation took some thinking about and was a little tricky but is not enormously complex. In total, the core of the library is about 1000 lines long, with another 1000 lines of auxilliary code for user interaction.

My first thought was to use a model-view-controller architecture, which is the classic mechanism for maintaining two or more views over the same data. Unfortunately, the core Emacs data structure for representing text (the buffer) is defined in the C core of Emacs and is not easy to replace. Indirect buffers, an Emacs feature which lentic can mimic, for instance are implemented directly in the core. Secondly, MVC does not really make sense here: with MVC, the views can only be as complex as the model, and I did not want the lentic to be (heavily) syntax aware, so that it can work with any syntax. So, we do not have much in the way of a data model beyond the a set of characters.

Lentic instead uses a change percolation model. Emacs provides hooks before and after change; lentic listens to these in one buffer and percolates to the other. My first implementation for lentic (then called linked-buffers) simply copied the entire contents of a buffer and then applies a transform (adding or removing comments for instance). This is easy to implement, but inefficient, so it scaled only to 3-400 lines of code before it became laggy. The current implementation is incremental and just percolates the parts which have changed. To add a new transformation requires implementing two functions — one which “clones” changes and one which “converts” a location in one buffer to the location in the other.

Re-implementing this for other editors would not be too hard. Many of the core algorithms could be shared — while they are not complex in the abstract, for incremental updates there are quite a few (syntax-dependent) edge cases. Similarly, I went through several iterations of the configuration options: specifying how a particular file should be transformed. This started off being declarative but ended up as it should do in a lisp: with a function.

I now have transformations between lisp (Clojure and Emacs) and latex, org-mode and asciidoc. These use much of the same logic, as they demark code blocks and comment blocks with a start of line comment. This is essentially the same syntax that Haskell’s literate programming mode provides. The most complex transformation that I have attempted is between org-mode and emacs-lisp; the complexity here comes because emacs-lisp comments have a syntax which somewhat overlaps with org-mode and I wanted them to work together rather than duplicate each other. For all of these implementations, my old netbook can cope with files 2-3000 lines long before any lag starts to become noticable. Lentic is fast enough for normal, every day usage.


Experience

The experience of implementing lentic has been an interesting one. In use, the system works pretty much as I hoped it would be. As an author I can switch freely between document and code-centric views and all of the commands work as I would expect. When I have a big screen, I can view both document and code-centric views at the same time, side-by-side, “switching” only my eyes. I can now evaluate and unit-test the code in my Tawny-OWL (http://www.russet.org.uk/blog/3030) manual.

By building the transform into the editor, I can get immediate feedback on both parts of my literate code. And, my lenticular text is saved to file in both forms. As a result of this, no other tools have to change. Clojure gets a genuine Clojure file. LaTeX gets a genuine LaTeX file. And for the author, the two forms are (nearly) equivalent. We only care which is the “real” file at two points: when starting the two lentic views, and when versioning as we only want to commit one form. The decision as to which form is “primary” becomes a minor issue. The lentic source code, for example, is stored as an Emacs-Lisp file, which can transform into an Org-mode file, so that it can work with MELPA (as well as for bootstrap reasons). The manual I am working on is versioned in the LaTeX form but transforms into Clojure so that people cloning the repository will get the document first; I use Emacs in batch to generate Clojure during unit testing.

There are some issues remaining. Implementation wise, undo does not behave quite as I want, since it is sensitive to switching views. And, from a usability point-of-view, I find that I sometimes try capabilities from the code-centric view in the document-centric or vice versa.

Despite these issues, in practice, the experience is pretty much what I hoped; I generally work in one form of the other, but often switch rapidly backward and forward between the two while checking. I find it to be a rich and natural form of interaction. I think others will also.


Conclusions

I doubt that tooling is the only difficulty with literate programming, but the lack of a rich editing environment is certainly one of them. Lenticular text addresses this problem. So far I have used lenticular text to document the source code of lentic.el itself, and to edit the source of a manual. For myself, I will now turn to what it was intended for in the first place: literate ontologies allowing me to produce a rich description of a domain both in a human and computational language.

The notion of lenticular text is more general though, and the implementation is not that hard. It can be used in any system where a mixed syntax is required. This is a common problem for programmers; lenticular text offers a usable solution.

Bibliography

Literate programming comes in many forms and disguises but is essentially the notion that the documentation and programmatic code should be written together, so that the documentation supports the code and vice versa. In this post, I discuss some of the problems with literate programming, my early attempts to circumvent these with respect to ontology development. Finally, I finish up with a description of some new technology which, I think, offers a solution.


Literate Programming for Ontologies

The reality is, I think, that literate programming has never really take off; there are a large number of reasons for this of course. Code does not naturally have an linear narrative and is not necessarily read in this way: rather, when read by an experienced programmer, they often track the flow of execution through the code (http://synesthesiam.com/posts/modeling-how-programmers-read-code.html). A secondary problem is apparently quite trivial but the editing environment for literate programmes tends to be poor. I cannot find any good research on this, but this is both my experience and that of others (http://unspecified.wordpress.com/2010/06/04/literate-programming-is-a-terrible-idea).

For ontology development, I think a literate approach seems to make more sense. Again, in my experience, ontologies do have a somewhat more narrative approach than code — at least in the sense that the lack loops and the like.


Initial Approaches

I have now been experimenting with literate techniques since 2009. The first version used a single latex file, and pulled these out into a Manchester syntax file (http://www.russet.org.uk/blog/1213). This worked quite nicely but suffered from the poor editor problem: I was building ontologies embedded in LaTeX, so lacked even the basic features (such as syntax highlighting) that I got when editing Manchester syntax files directly. This was a problem even with the very limited feature set from tools like omn-mode.el. The disadvantage would have been worse if I had been used to a richer environment for Manchester syntax.

My second attempt was took the opposite approach; now I used two files — a Manchester syntax file and a LaTeX one with a method for referring between the two (http://www.russet.org.uk/blog/1258). This worked okay but had a poor implementation which I later refined (http://www.russet.org.uk/blog/1269).

These approaches have their advantages but do both suffer from a poor editing environment; either in having two files to switch and link between, or favouring documentation over ontology or vice versa. They also suffered from a secondary issue, which is that they are based around Manchester syntax. While this is nice enough, since writing Tawny-OWL (1303.0213), this style of ontology development just feels not rich enough.


Marginalia

One of the declared advantages of using a real programming language as the basic for Tawny-OWL was the ability to use the tools from that language; I have used a number of these both within Tawny-OWL and with ontologies written with Tawny: mostly obviously, the test environment, but also serialisation, properties support and, of course, the entire editing environment.

This raises the question as to whether I could use literate programming tools from Clojure as well. To my knowledge, the only real option in town here is Marginalia. Marginalia uses markdown as the documentation format and builds a nice presentation with code on one side, and comments on the other.

However, it has problems. Firstly, it presents all comments as text — you cannot comment the comments as it were which is irritating for boilerplate such as licence text. Secondly, the side-by-side presentation breaks the flow of reading as you have to move your eyes around the screen all the time. And, finally, it’s Markdown. While Markdown is nice at what it does, it’s very limited, and I missed the extra power of something like LaTeX.

The main difficulty, though, remains the editing environment. Without special support, while editing the comments show up as just comments. I can never remember the order of brackets in Markdown links — I rely on syntax highlighting to tell me that I have it correct.

Is there a way that Clojure and LaTeX can be made to work together?


LaTeX experiments: line comments

My first thought, in experimenting with LaTeX was a remarkably cheap and cheerful one. Consider a document such as this:

;; \documentclass{article}
;; \begin{document}
;; \begin{code}
(println "hello world")
;; \end{code}
;; \end{document}

This is a valid Clojure file, and is nearly valid latex as well. The only illegal part is that ;; occurs before the documentclass macro, although, in practice having ;; appear randomly throughout the document would not be ideal either.

Now, LaTeX as an embedded markup language has a very plastic syntax, and I have used that in this case. It is actually very easy to just ignore the ; character entirely, through the use of Catcodes; we can put this into a driver file which then inputs our Clojure file like so:

\catcode`;=9
\input{file.clj}

This way we maintain the validity of our Clojure file (otherwise the first line would be illegal). This is a remarkably cheap and cheerful way of achieving our aims; albeit at the cost of losing the ability to use semi-colons in our writing.


Indirect-buffers

What, however, about the editing environment. My own preferred environment — Emacs — has nice modes which edit both LaTeX and Clojure code, and it is possible to switch between the two, when I want to move between editing code and editing documentation. This is quite clunky, but there is a second option which is “indirect-buffers”. This is a piece of Emacs arcana where two buffers share some of the same data structures but not all, which means that they can have different modes. Unfortunately, my experience is that the buffers share too much — as well as the text, they also share “text-properties” which unfortunately both LaTeX and Clojure mode use. In practice, this means syntax highlighting fails (or rather than two representations fight with each other). As a second problem, although the file is valid LaTeX it is not normal LaTeX; simply things like wrapping text in paragraphs fails because of the ;; comments at the beginning of each line.

So, this experiment fails the editing environment test.


LaTeX experiments: block comments

My next attempt was to use block comments. Consider this file which is valid lisp using #| and |# block comments.

#|
\documentclass{article}
\begin{document}
\begin{code}
|#
(println "hello world")
#|
\end{code}
\end{document}
|#

We can use a similar (but not identical) trick with catcodes to make this valid latex also:

\catcode`#=\active
\def#{\catcode`#=6}
\catcode`|=\active
\def|{\catcode`|=12}
\input{hello_world.lisp}

The first call makes the # character active — that is, definable as a macro. We then define # as a macro which will set the catcode of # to 6 (which is it’s default). Then, we do the same with |. The practical upshot of this is that the opening #| does nothing other than reset everything in the driver file; effectively it’s ignored.

This actually works quite nicely in the editing environment; the opening #| effectively makes no difference to Emacs, and the mode works well. The only real disadvantage is that every code block needs two delimiters — one to open the code block in latex, and end the comment in Lisp.

Now there are various multi-mode tools around for Emacs which should help solve the otherwise clunky editing environment, although even here I am not convinced that this is the right route. Multi-mode tools are complex and to some extent are not what I want — when editing code I want to suppress the documentation, give it a low visual immediacy, and when editing documentation, I want the reverse.

There is, however, a bigger problem — while the last example is valid Common Lisp, Clojure does not have block comments, nor does the programmer have the ability to extend the reader in this way. So, while this seems a nice solution, it depends on a specific language feature which Clojure lacks.


Emacs Experiments: formats

My next idea was to use formats. Emacs allows transformations to happen between the text that is visualised on screen, and how it is saved to file. The main reason for this is to support the many non-ascii text formats that exist. But it is (perhaps unsurprisingly) fully extensible within Emacs and could be used for any purpose. So, why not convert line-commented Clojure on file into block-comments on screen; this will give editable latex on screen and valid Clojure on file and a driver file to give valid Latex on file also.

Unfortunately, it fails. While Emacs latex support is file based, Clojure (and specifically cider) has a tighter integration; it can communicate the contents of a buffer without saving to file. This circumvents the formatting — the block comments are sent to Clojure process which complains.


Emacs Experiments: linked-buffer

I am now experimenting with another option. indirect-buffers place exactly the same text (and text-properties) into two buffers. Instead of sharing all the text, why not have two buffers with a function that can transform the text bi-directionally between the two. The practical result is two views over the same content. Surprisingly, this works pretty well, as you can see here even though my current implementation is very simple — the whole buffer is copied every keypress. We could achieve the same thing with indirect-buffers, but as well as simple copying, however, we can also transform the text on the fly so that both buffers are valid for their respective modes.

The broad idea is not that new — it’s similar to web/weave or SWeave for instance, except that it is embedded into the editor; this means we can take advantage of existing support for both languages from the editor and the author gets immediate feedback about the transformation — so messing up the syntax is pretty obvious.

It also provides a superset of functionality provided by other techniques: indirect-buffers as mentioned previously, shadowfile.el (which creates a second copy of a file somewhere else on every save), and it could also mimic shadow.el which generates a secondary file by a command invocation on every save (although an invocation of an external command every keypress would probably not be performant).

The first release of linked-buffer was a month ago. I am currently unhappy with the configuration and will change this so code is in flux at the moment, but I am using it in anger which is a good sign. Currently, it does a latex <-> clojure transformation, but I will add a few more as time goes on.


Discussion

It has taken me quite a while to get to this stage, and a number of experiments along the way, but my feeling is that I now have a workable literate environment. It also validates my decision to build Tawny (http://www.russet.org.uk/blog/2962). Having a rich textual language for building ontologies is a bit of a game changer; providing programmatic extensions to the language has been helpful, but the access to other tools, git, travis, tests and a repl has really made the difference. Now adding a literate environment to this as well changes the way that I can use ontologies and is a paradigm shift in their development.

Bibliography

I’ve been writing a lot of lisp recently, both to extend my Emacs environment for OWL (http://www.russet.org.uk/blog/2161), and with my Clojure OWL library (http://www.russet.org.uk/blog/2254). I have been trying out two new modes to support this. The first is paredit.el which I have managed to miss despite knocking out Lisp for years; it’s a work of insane genius; fantastic when it does the right thing, but sometimes I find myself stuck in a rut. This will probably improve over time, but is only going to work when I am writing a lot of lisp.

My initial solution to this problem was to use the paredit cheat sheet. This is good but unfortunately does not scale as it is only available as an image. I was a bit surprised to find that this cheat sheet is actually build on information that is embedded in the paredit source. Paredit uses this for generating help strings. This also makes it possible to generate a menu with examples as tooltips, which I have now done with paredit-menu.el. Code is available on http://code.google.com/p/phillord-emacs-packages/

My second problem was with show-paren mode. This is very useful for lisp, but I find it irritating in other modes. This is particularly the case because of my own pabbrev.el. This offers abbreviation expansions using sq[uare] brackets; even though this are transitory show-paren highlights which produces a rather annoying flickering on screen. Unfortunately, there is no way of blocking this — adding an overlay which blocks show-paren would be the ideal solution. Worse, show-paren, even though it is a minor mode is global; once it is on, it is on in every buffer. Really, it needs rewriting so that it can be switched on and off on a per-module basis.

My solution to this is to switch the minor mode on and off depending on the current mode. This isn’t worth turning into a package (since it’s a hack), but I put it up here in case anyone finds it useful.

;; show-paren-mode is a minor mode, but it's automatically global. Annoying.
(defun phil-show-paren-mode-check()
  ;; it isn't and it should be
  (when (and (phil-paren-mode-should-be-active-p)
             (not show-paren-mode))
    (show-paren-mode 1))
  ;; it is and it shouldn't be
  (when (and (not (phil-paren-mode-should-be-active-p))
             show-paren-mode)
    (show-paren-mode 0)))

(add-hook 'post-command-hook
          'phil-show-paren-mode-check)

(defun phil-paren-mode-should-be-active-p()
  (memq major-mode phil-paren-mode-active))

(defvar phil-paren-mode-active
  '(clojure-mode emacs-lisp-mode))

Bibliography