Yesterday, I came across a very strange bug in my Tawny-OWL library (1303.0213). The bug was my fault, but fixing it was exacerbated by what has to go into my list of Clojure gotchas (http://www.russet.org.uk/blog/2991).

Consider this very simple piece of Clojure code. Is this a recursive function, i.e. one that calls itself? Intuitively the answer is yes, but I would argue that actually, you cannot tell from the information given. It could be, but is not necessarily so.

(defn countdown [n]
  (if (pos? n)
    (cons n
          (countdown (- n 1)))
    '(LiftOff)))

Let’s run this code:

test-project.core> (countdown 10)
(10 9 8 7 6 5 4 3 2 1 LiftOff)

All is working as it should and the function is recursing, so there appears to be no problem. However, countdown is a big long to type, so lets create an alias called cdown and test that this works as expected.

test-project.core> (def cdown countdown)
#'test-project.core/cdown
test-project.core> (cdown 10)
(10 9 8 7 6 5 4 3 2 1 LiftOff)

All is good, cdown is doing just what we expect also, and it’s the same as countdown. However, as we as being lazy, I am easily bored, so now we make the countdown go faster by redefining the countdown function.

(defn countdown [n]
  (if (pos? n)
    (cons n
          (countdown (- n 2)))
    '(LiftOff)))

test-project.core> (countdown 10)
(10 8 6 4 2 LiftOff)

Again, everything is working. But what happens when we call cdown

(cdown 10)

If you are new to clojure, you might assume that as we have changed the definition of countdown the definition of cdown would change as well, so we should get

test-project.core> (cdown 10)
(10 8 6 4 2 LiftOff)

If you are more experienced with Clojure (as I would claim myself to now be, although others may disagree), then you would not that the alias was to the function and not the var, so cdown now contains a pointer to the old definition, so it will behave as it did before.

test-project.core> (cdown 10)
(10 9 8 7 6 5 4 3 2 1 LiftOff)

As it happens, this is not true either. What we actually get it this:

test-project.core> (cdown 10)
(10 9 7 5 3 1 LiftOff)

A wierd combination of the two functions. What is happening is this: the initial function call to cdown calls our first function definition; 10 is postive, so we subtract 1 to get 9. And, now, here is the strange bit. The function is no longer recursive, because it calls countdown through the indirection of the clojure var; we could be more explicit and replace the “recursive” call with this instead which is what is actually happening in the compiled Java class.

((var-get (var countdown)) 10)

So, in short, in Clojure a recursive function is not necessarily recursive, even if it is when defined. If the var changes, then the function no longer calls itself.

There are solutions to this. I should have defined my cdown alias like so:

(def cdown #'countdown)

Now, when we defined countdown the original function gets lost and GCd rather than hanging around, a pale shadow of its former recursive self. This works but is rather counterintuitive; with a Lisp-1 you get used to refering to functions directly rather than with a quoted symbol, but here we have to use both a quote and a #.

A second solution is to use recur, which does not use var indirection. So consider this example. The first countdown2 function is recursive and will always be since it directly refers to itself, and this remains after redefinition.

(defn countdown2 [i]
  (when (pos? i)
    (println i)
    (recur (dec i))))

(def cdown2 countdown2)

(defn countdown2 [i]
  (when (pos? i)
    (println i)
    (recur (- i 2))))

;; prints 10, 9, 8...
(cdown2 10)

Of course, we cannot use this in the first example, because it is not tail recursive. And, if we want to be really picky, we could argue that countdown2 is not recursive either because it is tail-call optimised, so it doesn’t call itself. And this is not an implementation detail either because it’s the whole point of recur.

Another solution would be for Clojure to provide a recur* form (or something equivalent) which would work away from the tail position and which directly recursed, rather than using var indirection. It’s unlikely that Clojure will do this, however, because this form of recursion consumes stack and so is not a good thing to do.

It would also be possible to change the semantics of Clojure symbols so they automatically refered to vars rather than values as appropriate. So in all these cases, we would have vars. As far as I can tell at the moment, the same syntax y is refering to two different things — the value of the var y in the first two forms and the var itself in the third.

(def x y)
(def z {:x y})

(defn y[]
  (y))

To replicate the current behaviour, we would just use var-get.

(def x (var-get y))

Without thinking about this too deeply, I cannot actually see a downside to this, except that it would involve more var indirection, so would probably be slower.

Finally, clojure could just remove var indirection. Actually, this is likely to happen in the form of build profiles, because it also has the advantage of making Clojure faster. As my example shows, however, it also introduces some subtle changes to the semantics of the language.

I still like Clojure. The more you use a language, I guess, the more oddities you find.

Bibliography

11 Comments

  1. Eric Normand says:

    Good writeup!

    I think your point is right-on, but I have a different perspective. You can also see this as a difficulty with dynamic programming, where you are redefining functions.

    Why is it about dynamic programming?

    If after you redefined your function, you recompiled everything, the shortcut name would refer to the correct function and everything would work fine. It is only because you recompiled one function that a problem occurred.

    I think dynamic programming will always have such problems. I remember similar (but different) problems with Common Lisp. I would get to a point where I couldn’t reason about the state of the system and just had to recompile everything again. It no longer corresponded to what I had in my file.

    Again, this is unfortunate but I would not want to change my coding style just to support these edge cases while developing. From my perspective, using dynamic programming is just a powerful way to write a program that does the right thing when loaded fresh.

    Awesome job documenting these gotchas!

    Eric

  2. Phillip Lord says:

    Actually, there is a simple solution to the problem as I laid it out. Which is a “defalias” macro where I go

    (defalias cdown countdown)

    which actually does

    (def cdown #’countdown)

    Sadly, in my real use case, rather than my example, I didn’t have this. Actually I had

    (def handler {:h1 f1 :h2 f2})

    So, defalias wouldn’t work for this circumstance. Basically, I have to remember to use the
    vars when I want. Yes, you are right, evalling all the code would solve the problem (unless
    of course there is a defonce hidden somewhere!). I’ve had this problem several times, though. So
    I know that in practice, sometimes this gets forgotten.

  3. DannyB says:

    > It’s unlikely that Clojure will do this, however,
    > because this form of recursion consumes stack and
    > so is not a good thing to do.

    Recursion that consumes stack is a valid thing to do if you simply want to represent part of a data structure on the stack.

    For example, depth first walking of a tree can be recursive in this manner and is a reasonable approach for any reasonable tree depth. Weren’t we doing this kind of recursing in, say, Pascal in the 1980’s or 1970’s?

    But I’ll agree about this. Recursion that consumes stack simply to decrement a number is not a good thing to do.

  4. recur says:

    “And, if we want to be really picky, we could argue that countdown2 is not recursive either because it is tail-call optimised, so it doesn’t call itself. And this is not an implementation detail either because it’s the whole point of recur.”

    It sure sounds like implementation to me, even if it’s required by the language spec (does one even exist?). Are you arguing that the “recur” feature is not “recursion” because the optimizer generates code that re-uses the stack frame before the JMP instruction?

    I don’t think the computer science definition of “recursion” ever required it to actually consume stack space. As wikipedia says, for example, such a function is said to be “tail-recursive, which is a special case of recursion.”

  5. Ceasar Bautista says:

    For the curious, the same effect will happen in Python via::

    “`
    def countdown(n):
    return (n,) + countdown(n – 1) if n > 0 else (0,)

    print countdown(10)

    cdown = countdown

    print cdown(10)

    def countdown(n):
    return (n,) + countdown(n – 2) if n > 0 else (0,)

    print countdown(10)
    print cdown(10)
    “`

  6. Ceasar Bautista says:

    Also, @Eric, be careful with your use of “Dynamic programming”– you mean “Dynamically-typed programing languages”, not “a method for solving complex problems by breaking them into simpler subproblems”.

  7. Francis Avila says:

    In recursive functions, you can make the current function name a local var to avoid the surprise of late-evaluation of the recursive call:

    (def countdown
    (fn countdown [n] ;; countdown is now a local, not a var.
    (if (pos? n)
    (cons n
    (countdown (dec n))) ;; countdown is not redefinable.
    ‘(LiftOff))))

    Now when you redef, you get the less-surprising behavior of cdown referring to the *old* countdown instead of *a mix of different countdowns*. You still need your defalias for cdown to always point to the new countdown.

    This pattern is used by the core during bootstrapping before defn is defined. I don’t understand why defn doesn’t expand to this (binding the function name as a local)–seems like it should!

    The downside of this pattern is that defn takes care of some metadata issues for you that you now have to handle yourself (e.g. set :arglist on the var).

  8. Norman Richards says:

    Ok – let’s look through this. First, every def in clojure comes down to an expression of the form

    (def some-var some-expression)

    where some-expression is evaluated and put in the named var. Everything for of defanything including defn comes down to this.

    Now, it’s impossible for some-expression to refer to some-var without using a trick. Think about it for a second. You can’t say (def x (+ x 1)) where x refers to the value that is currently being defined. Similarly, without some trick, you can’t say (defn x [] (x)). What is x? It’s not defined here. If it’s not clear, think of that defn loosely as:

    (let [x (fn [] (x)]
    x)

    Now obviously, you can do this in Clojure. Why? It’s because of mutable vars. You can create a space called X that is empty, and then place in a function that can look up the value of the var X. Then you can assign that function to the var X.

    Using atoms instead of vars, consider this definition of countdown:

    (def countdown
    (let [the-fn (atom nil)]
    (reset! the-fn
    (fn [n] (when (pos? n)
    (cons n (@the-fn (dec n))))))))

    Now, the recursion isn’t tied to a mutable piece of state exposed in a way that you can change it. So, if you

    (def cdown countdown)
    (defn countdown [] :donkey)

    Then cdown till works in the way you anticipate and so will countdown.

    When doesn’t Clojure work this way by default? Should it? I vote no. The only time you should be redefining vars is at the REPL. And, I’m personally grateful that if I have a function y that calls a function x and I interactively redefine x that y immediately starts using the new definition. This is exactly what I want when working interactively.

    Now, you do have problems in the case you mention where you pass the function as a value. If you do pass the function as a value and you want it to be updated, use a #’the-fn rather than the-fn. And, if you want to alias a function, alias it. It may not be immediately obvious to the casual observer, but once you learn it, it makes a lot of sense.

  9. Norman Richards says:

    One additional note. You don’t have to use mutability to have a function that recurses. Here’s a Y-insired countdown that has the behavior you want.

    (def countdown
    (let [y-count
    (fn [x]
    (fn [n]
    (when (> n 0)
    (cons n ((x x) (dec n))))))]
    (y-count y-count)))

  10. Phillip Lord says:

    @recur I did say “if we want to be picky”. The compiled code is not recursive, as can be easily seen by decompiling.

  11. Phillip Lord says:

    @norman Sure, the behaviour is useful in general, but it produces an obtuse result here. One solution (recur*) solves the problem
    specifically here, without affecting the general case. Alternatively, the compiler could special case the behaviour of a function
    call to NOT use var indirection when recursing. Personally, I quite like the recur* solution because it enables recursion without a name
    so, for example, you could design anonymous recursive functions.

Leave a Reply