An Exercise in Irrelevance - Pandoc and Zola

In my last post, I noted that Zola was not a powerful enough environment for me to use. It requires markdown in a very specific format and that was just not rich enough.

So I have gone the route of producing the markdown content that zola operates from elsewhere. This has given me two advantages: first, I have managed to generate most of my zola site from existing source code, even if that was not in markdown; and second, I have added new functionality.

The tooling

I thought a lot about how to achieve this, but in the end I went for a combination of some makefiles and pandoc. Now, Pandoc is a fantastic tool able to transformations between many different formats, but it also have some pretty advanced functionality. It is not a perfect tool. I have found the output a bit erratic, at times, but with a bit of care it can work. It also seems very version dependent and every more so when using it with panflute (more of which later). So I had to install it with cabal, compiling from source. That works okay but is really, really slow. On my laptop, the install time is in hours, which is far from great.

The big feature of pandoc that I depended on was the use of filters; it means I can modify the abstract syntax tree during the production of markdown. I chose to write these in panflute because the API seems nicer; I wanted python rather than lua (although lua would probably run quicker) because it’s easier and life is too short.

Recovering old source

I have written two main filters. The first of these is called “recover”. It is very specific to my use case so I have not published it. The blog now has two legacy source formats. My very old material was written in muse-mode; although I still have the source, I had put too many modifications into muse, generating out HTML. Getting muse working again or even using pandoc did not seem worth the effort. So I scrapped the body HTML from my own wordpress installation; this is then added verbatim to a markdown file which otherwise just contains header information.

The second source format is in asciidoc. This, I recovered from using asciidoctor (to produce XML), and then pandoc to produce markdown. The metadata here required more tweaking – I actually scraped this from some python pickle files produced by a tool called “blogpost” that I used to use for publishing. Hacky, but working.

Adding functionality

I wrote a second tool called multi-cite. This adds some of the functionality of my own kcite tool. This allows me to add references and generate out a reference list. It’s similar to a tool called doi2cite but at least when finished should support more identifiers (including arXiv, pubmed IDs and simple URLs).

Conclusions

Overall, the process has been rather hacky, but has meant that I can keep my original source, and does give me adaptability into the future. None of it is working perfectly yet – there are bugs and random markup in the presentation. But I wanted to get the website up and running as perfection is the enemy of the good. In time I may come to fix these, and if not, well, the content was always the main point of this blog rather than the presentation!

Contents

The tooling

Recovering old source

Adding functionality

Conclusions