In this post, I describe the different technologies that I’ve tried for writing a new book, from markdown to LaTeX.

One of the questions that I have been asked about Tawny-OWL (http://www.russet.org.uk/blog/2366) was whether there was a manual or not. Actually, the answer is, yes, there is and has been since well before the 1.0 release.

However, it’s not rich enough. It makes requires both a reasonably good background in Clojure and in Ontology development. Given that the overlaps between these two areas of knowledge is probably limited to myself and Jennifer Warrendar this is rather less than ideal. So, I’ve started a new manual in the form of a book called Take-Wing. Getting a nice environment has been an interesting journey.


Markdown

The orginal version of the documentation was written in Markdown. There was not really a good reason for this, other than I wanted to try it and it is well supported. It’s not a bad format and is easy to type, although it lacks in extensibility (which is, of course, why it has been extended in so many different ways!).

However, one significant problem was that I have no good way of checking that the code samples in it work. I had been hit by this when I changed function names, for example, from owlclass to owl-class, or more perniciously when I moved one entity from a variable to a function. My general solution to this linked-buffer which translates documentation into code and vice-versa. The lack of an explicit delimiter for the start and end of code blocks makes this harder to implement for markdown.


Asciidoc

My next thought was to move to asciidoc. I like asciidoc and have used it for years. I helped to add slidy support to it, so that I can use it for slides. Most of my teaching material uses asciidoc now, because I can get slides and lecture notes from the same source, with easy to integrate code snippets that I can run, albeit maintained in independent source files which can make things a little painful.

The first version of Take-Wing used asciidoc. Because I wanted to use linked-buffer to generate Clojure source that I could test, I needed to use multiple files with a master file — the Clojure cookbook uses much the same system, albeit for a slightly different reason.

However, writing a talk or even a lecture series is a very different thing from writing a book. The main problem was not asciidoc itself, but the support from Emacs. There is an asciidoc mode but, aside from needing patching, it is heavily focused on syntax highlighting rather than document structure. So, inserting cross-references and the link were painful.


Org-Mode

My next idea was to try org-mode. So, I used pandoc to do the translation; this wasn’t a glowing success, but it achieved the basics anyway.

And Emacs specific solution doesn’t seem ideal, but then for Take-Wing I am likely to be the main author. And I have been using org for a while, particularly for building plans both for Tawny-OWL and also, ironically, for the outline for Take-Wing.

It has clear code demarcation blocks, so adding support for linked-buffers was straight-forward enough. One significant problem is that the org-mode export framework has recently been replaced, so I have to install my own version; actually, installing a new version is not problematic as this can be done with package.el, but because org-mode is in core emacs, deleting the old version is doesn’t happen automatically. I found several version conflicts resulting from loading old files. Still, I managed to get org working, producing both a PDF and nice web page.

Org-mode was pleasant enough to use. The folding features were nice and the syntax is reasonable for typing. However, I found the same problems as with asciidoc. While, org has advanced features for linking to the rest of the world, linking within documents is not so good, especially when using a master document. Basically, I couldn’t get over the envy of my PhD students both of who have written their thesis with AucTex


LaTeX

LaTeX itself is an old piece of software, but it does the job of document preparation like nothing else. And, the support in Emacs is fantastic. However, it’s achilles heel has been that is produces PDFs. I don’t like PDFs for a variety of reasons, but the main one is that, essentially, it’s a screenshot of a piece of paper. I’ve long gone natively digital: I write my papers, blog posts, emails and everything else on computer, and I am used to reading in this way as well. And I want the web for reading documentation.

In my time, I’ve tried lots of different technology for webifying LaTeX. The best I had found was Plastex; my post on LaTeXtowordpress (http://www.russet.org.uk/blog/1740) is still one of my best read, and my colleague even used it to post her PhD thesis (http://themindwobbles.wordpress.com/2012/06/14/converting-a-latex-thesis-to-multiple-wordpress-posts/). It’s a good tool, but it’s not using TeX underneath and so it doesn’t always do the right thing as a result.

For some reason, I managed to have missed tex4ht, though, even though it has been around for quite a while. My initial impressions are, though, that it is probably the best tool that I have seen even if it has been blessed with some of the worst documentation and a very strange command line. I’m hopefully that make4ht may help this. It’s author has been fantastic on StackExchange in answering my questions. So, I decided to give it a go, so I exported by org to latex, threw away my org and started again.

The first problem that I didn’t like it’s handling of the listings package. I have fixed this by dropping code straight through to HTML. Initially, I tried highlight to do syntax highlighting, but then dropped this for prism, as the former cannot do inline highlighting also. The default footnote handling is also very strange, but this is also easy to fix with a standard package. So, now, after a lot of fiddling, it all seems to be working.


Conclusions

It’s been quite a lot of effort and rather more fiddling than I would have hoped, but I now have an editing environment that I like, and this is important. Writing a document is hard, and the environment needs to support this and not get in the way.

I am hopeful that tex4ht will fulfil it’s initial promise. I have been using LaTeX less and less over the years, simply because of its lack of HTML support, despite the fact that it’s the best environment for writing in: so when I have been most happy as an author, I have been least happy as a reader.

All I have to do now is write the rest of the book; so far, I think it’s going well, and this is something that I shall cover in later blog posts.

Bibliography

8 Comments

  1. Skottk Klebe says:

    I wonder if you ever came across Pollen? [https://github.com/mbutterick/pollen]

    Book authoring/typesetting in Racket, by the author of http://practicaltypography.com

  2. Phillip Lord says:

    I haven’t to be honest, but it isn’t really what I am after. The reason that LaTeX is so good is not because of the programming language (anyone who has written any serious tex would not argue that it’s great for programming) but because it is mature and comes with a fantastic environment for writing — actually a number of them.

  3. Michael Bradley says:

    Did you experiment with Bruce Miller’s LaTeXML?

    https://github.com/brucemiller/LaTeXML

    http://dlmf.nist.gov/LaTeXML/

    The xml output of the “latexml” command can be run through “latexmlpost” to generate HTML5 or XHTML, which can then be further transformed, e.g. with Christophe Grand’s Enlive library for Clojure. I’ve been employing such a workflow myself, with great success!

  4. Phillip Lord says:

    I didn’t, and it might be worth a go, but for the moment, I am going to concentrate of tex4ht.

    This nut really needs to be cracked. What I would *really* like to do is put all the freely licensed tex in arxiv.org into an HTML converter. There needs to be a generic solution for this. Even with take-wing, I’ve had to do a fair bit of configuration.

  5. Michael Bradley says:

    “LaTeXML was used to convert 90% (60% without errors) of 530,000 documents from the arXiv to XML.”
    http://en.wikipedia.org/wiki/LaTeXML

    Referencing: http://old.kwarc.info/kohlhase/papers/mcs10.pdf

    That was in 2010. LaTeXML has been under continuous, active development since then, so I imagine its current release might do an even better job.

  6. Michael Bradley says:

    I forgot to mention in my previous comments that LaTeXML sports a TeX package (lxRDFa) which can be used to markup TeX documents with metadata that will pass into the XML output of “latexml” and the X/HTML output of “latexmlpost”:

    http://dlmf.nist.gov/LaTeXML/manual/metadata/RDFa.html

    Given your work with ontologies, I thought you might find that capability interesting. In my case, I use it to provide hints in the “latexmlpost” output with respect to transformations I intend to make with Enlive and/or with JavaScript in a web browser, but which don’t have counterpart-transformations in the PDF/s generated from the same LaTeX sources.

    I have not spent much time with tex4ht, so it might have similar or superior facilities for generating metadata. What has been your experience?

  7. Phillip Lord says:

    Yeah, I saw that in the documentation. It looks a pretty attractive feature and one that would be well worth using.

  8. Phillip Lord says:

    They have stats on their website which suggest it’s still 60%. The problem is the ones with errors may not be usable. It is a shame that the results of the transformation are not public. I’d like to see my own papers:-)

Leave a Reply