Adventures in Text Land

2014-10-16

In this post, I describe the different technologies that I've tried for writing a new book, from markdown to LaTeX.

One of the questions that I have been asked about Tawny-OWL [@url:www.russet.org.uk/blog/2366] was whether there was a manual or not. Actually, the answer is, yes, there is and has been since well before the 1.0 release.

However, it's not rich enough. It makes requires both a reasonably good background in Clojure and in Ontology development. Given that the overlaps between these two areas of knowledge is probably limited to myself and Jennifer Warrendar this is rather less than ideal. So, I've started a new manual in the form of a book called Take-Wing. Getting a nice environment has been an interesting journey.

Markdown

The orginal version of the documentation was written in Markdown. There was not really a good reason for this, other than I wanted to try it and it is well supported. It's not a bad format and is easy to type, although it lacks in extensibility (which is, of course, why it has been extended in so many different ways!).

However, one significant problem was that I have no good way of checking that the code samples in it work. I had been hit by this when I changed function names, for example, from owlclass to owl-class, or more perniciously when I moved one entity from a variable to a function. My general solution to this linked-buffer which translates documentation into code and vice-versa. The lack of an explicit delimiter for the start and end of code blocks makes this harder to implement for markdown.

Asciidoc

My next thought was to move to asciidoc. I like asciidoc and have used it for years. I helped to add slidy support to it, so that I can use it for slides. Most of my teaching material uses asciidoc now, because I can get slides and lecture notes from the same source, with easy to integrate code snippets that I can run, albeit maintained in independent source files which can make things a little painful.

The first version of Take-Wing used asciidoc. Because I wanted to use linked-buffer to generate Clojure source that I could test, I needed to use multiple files with a master file --- the Clojure cookbook uses much the same system, albeit for a slightly different reason.

However, writing a talk or even a lecture series is a very different thing from writing a book. The main problem was not asciidoc itself, but the support from Emacs. There is an asciidoc mode but, aside from needing patching, it is heavily focused on syntax highlighting rather than document structure. So, inserting cross-references and the link were painful.

Org-Mode

My next idea was to try org-mode. So, I used pandoc to do the translation; this wasn't a glowing success, but it achieved the basics anyway.

And Emacs specific solution doesn't seem ideal, but then for Take-Wing I am likely to be the main author. And I have been using org for a while, particularly for building plans both for Tawny-OWL and also, ironically, for the outline for Take-Wing.

It has clear code demarcation blocks, so adding support for linked-buffers was straight-forward enough. One significant problem is that the org-mode export framework has recently been replaced, so I have to install my own version; actually, installing a new version is not problematic as this can be done with package.el, but because org-mode is in core emacs, deleting the old version is doesn't happen automatically. I found several version conflicts resulting from loading old files. Still, I managed to get org working, producing both a PDF and nice web page.

Org-mode was pleasant enough to use. The folding features were nice and the syntax is reasonable for typing. However, I found the same problems as with asciidoc. While, org has advanced features for linking to the rest of the world, linking within documents is not so good, especially when using a master document. Basically, I couldn't get over the envy of my PhD students both of who have written their thesis with AucTex

LaTeX

LaTeX itself is an old piece of software, but it does the job of document preparation like nothing else. And, the support in Emacs is fantastic. However, it's achilles heel has been that is produces PDFs. I don't like PDFs for a variety of reasons, but the main one is that, essentially, it's a screenshot of a piece of paper. I've long gone natively digital: I write my papers, blog posts, emails and everything else on computer, and I am used to reading in this way as well. And I want the web for reading documentation.

In my time, I've tried lots of different technology for webifying LaTeX. The best I had found was Plastex; my post on LaTeXtowordpress [@url:www.russet.org.uk/blog/1740] is still one of my best read, and my colleague even used it to post her PhD thesis [@url:themindwobbles.wordpress.com/2012/06/14/converting-a-latex-thesis-to-multiple-wordpress-posts/] It's a good tool, but it's not using TeX underneath and so it doesn't always do the right thing as a result.

For some reason, I managed to have missed tex4ht, though, even though it has been around for quite a while. My initial impressions are, though, that it is probably the best tool that I have seen even if it has been blessed with some of the worst documentation and a very strange command line. I'm hopefully that make4ht may help this. It's author has been fantastic on StackExchange in answering my questions. So, I decided to give it a go, so I exported by org to latex, threw away my org and started again.

The first problem that I didn't like it's handling of the listings package. I have fixed this by dropping code straight through to HTML. Initially, I tried highlight to do syntax highlighting, but then dropped this for prism, as the former cannot do inline highlighting also. The default footnote handling is also very strange, but this is also easy to fix with a standard package. So, now, after a lot of fiddling, it all seems to be working.

Conclusions

It's been quite a lot of effort and rather more fiddling than I would have hoped, but I now have an editing environment that I like, and this is important. Writing a document is hard, and the environment needs to support this and not get in the way.

I am hopeful that tex4ht will fulfil it's initial promise. I have been using LaTeX less and less over the years, simply because of its lack of HTML support, despite the fact that it's the best environment for writing in: so when I have been most happy as an author, I have been least happy as a reader.

All I have to do now is write the rest of the book; so far, I think it's going well, and this is something that I shall cover in later blog posts.