Yesterday, I came across a very strange bug in my Tawny-OWL library (n.d.a) The bug was my fault, but fixing it was exacerbated by what has to go into my list of Clojure gotchas (n.d.b)

Consider this very simple piece of Clojure code. Is this a recursive function, i.e. one that calls itself? Intuitively the answer is yes, but I would argue that actually, you cannot tell from the information given. It could be, but is not necessarily so.

(defn countdown [n]
  (if (pos? n)
    (cons n
          (countdown (- n 1)))
    '(LiftOff)))

Let’s run this code:

test-project.core> (countdown 10)
(10 9 8 7 6 5 4 3 2 1 LiftOff)

All is working as it should and the function is recursing, so there appears to be no problem. However, countdown is a big long to type, so lets create an alias called cdown and test that this works as expected.

test-project.core> (def cdown countdown)
#'test-project.core/cdown
test-project.core> (cdown 10)
(10 9 8 7 6 5 4 3 2 1 LiftOff)

All is good, cdown is doing just what we expect also, and it’s the same as countdown. However, as we as being lazy, I am easily bored, so now we make the countdown go faster by redefining the countdown function.

(defn countdown [n]
  (if (pos? n)
    (cons n
          (countdown (- n 2)))
    '(LiftOff)))

test-project.core> (countdown 10)
(10 8 6 4 2 LiftOff)

Again, everything is working. But what happens when we call cdown

(cdown 10)

If you are new to clojure, you might assume that as we have changed the definition of countdown the definition of cdown would change as well, so we should get

test-project.core> (cdown 10)
(10 8 6 4 2 LiftOff)

If you are more experienced with Clojure (as I would claim myself to now be, although others may disagree), then you would not that the alias was to the function and not the var, so cdown now contains a pointer to the old definition, so it will behave as it did before.

test-project.core> (cdown 10)
(10 9 8 7 6 5 4 3 2 1 LiftOff)

As it happens, this is not true either. What we actually get it this:

test-project.core> (cdown 10)
(10 9 7 5 3 1 LiftOff)

A wierd combination of the two functions. What is happening is this: the initial function call to cdown calls our first function definition; 10 is postive, so we subtract 1 to get 9. And, now, here is the strange bit. The function is no longer recursive, because it calls countdown through the indirection of the clojure var; we could be more explicit and replace the “recursive” call with this instead which is what is actually happening in the compiled Java class.

((var-get (var countdown)) 10)

So, in short, in Clojure a recursive function is not necessarily recursive, even if it is when defined. If the var changes, then the function no longer calls itself.

There are solutions to this. I should have defined my cdown alias like so:

(def cdown #'countdown)

Now, when we defined countdown the original function gets lost and GCd rather than hanging around, a pale shadow of its former recursive self. This works but is rather counterintuitive; with a Lisp-1 you get used to refering to functions directly rather than with a quoted symbol, but here we have to use both a quote and a #.

A second solution is to use recur, which does not use var indirection. So consider this example. The first countdown2 function is recursive and will always be since it directly refers to itself, and this remains after redefinition.

(defn countdown2 [i]
  (when (pos? i)
    (println i)
    (recur (dec i))))

(def cdown2 countdown2)

(defn countdown2 [i]
  (when (pos? i)
    (println i)
    (recur (- i 2))))

;; prints 10, 9, 8...
(cdown2 10)

Of course, we cannot use this in the first example, because it is not tail recursive. And, if we want to be really picky, we could argue that countdown2 is not recursive either because it is tail-call optimised, so it doesn’t call itself. And this is not an implementation detail either because it’s the whole point of recur.

Another solution would be for Clojure to provide a recur* form (or something equivalent) which would work away from the tail position and which directly recursed, rather than using var indirection. It’s unlikely that Clojure will do this, however, because this form of recursion consumes stack and so is not a good thing to do.

It would also be possible to change the semantics of Clojure symbols so they automatically refered to vars rather than values as appropriate. So in all these cases, we would have vars. As far as I can tell at the moment, the same syntax y is refering to two different things — the value of the var y in the first two forms and the var itself in the third.

(def x y)
(def z {:x y})

(defn y[]
  (y))

To replicate the current behaviour, we would just use var-get.

(def x (var-get y))

Without thinking about this too deeply, I cannot actually see a downside to this, except that it would involve more var indirection, so would probably be slower.

Finally, clojure could just remove var indirection. Actually, this is likely to happen in the form of build profiles, because it also has the advantage of making Clojure faster. As my example shows, however, it also introduces some subtle changes to the semantics of the language.

I still like Clojure. The more you use a language, I guess, the more oddities you find.