Clojure Gotchas

I’ve been programming in Clojure for well over a year now; originally, I heard about it care of Sam Aaron, an old PhD student of ours who gave a fun lunch time talk; I rarely go to these (although I probably should). Indirectly, Tawny-OWL came out of this one, so it is good that I went.

During the time that I have used Clojure, I have come to know it fairly well, and appreciate many of it’s finer points; these are not the same as many people, I realise; for me, the Java integration is simple, effective and very important, as Tawny-OWL is essentially a wrapper over a Java library. Meanwhile, a lot of the nice concurrency features are a matter of indifference to me, again for the same reason.

But like any language there are some problems, or at least thing that don’t work for me. On the off-chance it is useful to anyone else, here is my list of Clojure gotchas.

Lazy Lists

This is quite a common one, of course, which hits most Clojure beginners. We write something like:

 (def x (map (fn [x] (println "hello") x) (range 1 100)))

and then wonder why nothing prints out. Or, the alternative problem, write something like (range) and find the REPL hangs. The latter is, I think, a poorly performing REPL; infinity might be more principled a point at which to stop than an arbitrarily choosen value but it’s not useful.

Of course, once you have got past this point, it’s not so bad, but laziness can still take you unawares, especially when I was using Clojure just to drive a JVM library. This subtle bug from tawny.render which is, essentially, one big recursive descent renderer, demonstrates the problem. Consider, this code:

 (concat [:fact] (form [:fact fact]) (form [:factnot factnot]))

Looks fine, but I need to pass options and a lookup cache around and had done this with a number of dynamic vars. The cache, it turns out, would not have been working for this form (although it was for others), but I never noticed this. However, the options broke the code more cleanly. concat is, of course, lazy, and was happening outside the binding form which defines the dynamic vars.

Now, I know dynamic vars and laziness don’t mix. In the end, I have added an additional parameter to all the functions in my renderer using the awesome power of lisp (i.e. I wrote a dodgy search and replace function in Emacs). And the cache now invalidates itself using a better technique than before. But I didn’t want laziness, I just got it by chance. In Clojure, it’s always there, wanted or not. Or, rather, it’s always sometimes there, because Clojure is only partly lazy.

Lisp-1 vs Lisp-2

Well, this argument is as old as the hills. Clojure is a lisp-1, so it has a single namespace for variables and functions, while Common Lisp and Emacs-Lisp are a lisp-2, so have one namespace for each.

I’ve had fun with single namespaces before — I used to teach Javascript to new programmers and it produces wierd and wonderful bugs that can be hard to track down. Still, I am too old and wize for that. If only!

During Tawny-OWL, I found accidental capture of functions produced some strange artifacts. Consider, for example, this code.

 (defn my-get[x map] (get x map))

Everything works fine here, of course, right up till the point that you get bored of typing map and change it to m:

 (defn my-get[x m] (get x map))

Now things break in strange ways. map is now the (global) function and not the parameter. There are many ways around this, of course. I could not have done (use 'clojure.core) earlier and just imported the functions I use; except that I did use map elsewhere. I could namespace everything (try and find some examples of Clojure code with namespace qualfied or aliased clojure.core functions).

In my case, exactly this problem hit me when I renamed parameters called ontology to o. I thought the compiler would pick up my errors but no, because I had an ontology function. This situation is made worse by my next gotcha which is:

Everything is a function

Consider this entirely pointless piece of code which makes lisp post-fix.

 (defmacro do-do [x afn] (do ~(afn x)))

We can use this macro like so:

 (do-do 1 inc)

Now, if you know only a little about lisp, you might expect this to return 2. If you are more experienced, then you might think that this is a strange thing to do, because the call to (inc 1) happens at macro-expansion time, and why would you want to do that? If you are more experienced still, you will think, well actually inc is not evaluated so it is actually a symbol, and the whole thing is going to crash.

Actually, it returns nil. The reason for this is that lots of things in Clojure are functions that you wouldn’t expect, and symbol is one of these. So, actually, ('inc 1) returns nil. Because symbols are functions which lookup the occurance of the symbol in the collection that follows.

Now this has advantages, of course, namely that you can use a symbol to look up a key in a collection. So, for example:

 ('bob {'bob 1})

Returns 1. Of course, this is nice, but how many times do you actually want to do this? And when you do, would (get {'bob 1} 'bob) really be so hard? I can see the justification for (:bob {:bob 1}) but for symbols I am really not convinced, unless I am missing some other critical advantage.

Future, what’s a Future

 (def x 1) (def y (ref 2)) (+ @x y)

Now, in this small example, the error is easy to find; we should have derefed y and not x. And what is the error that we get from this?

 ClassCastException java.lang.Long cannot be cast to java.util.concurrent.Future clojure.core/deref-future (core.clj:2108).

But I have not used a future. I have never used a future. I do not even know what a future is (although, I may, of course do so in the future). The reason for this strange error message can be seen from the code for deref (which the @ reader macro uses. Since, integers do not implement IDeref we treat them as a Future, which then causes the cast exception.

 (defn deref {:added "1.0" :static true} ([ref] (if (instance? clojure.lang.IDeref ref) (.deref ^clojure.lang.IDeref ref) (deref-future ref))) ([ref timeout-ms timeout-val] (if (instance? clojure.lang.IBlockingDeref ref) (.deref ^clojure.lang.IBlockingDeref ref timeout-ms timeout-val) (deref-future ref timeout-ms timeout-val))))

This one is easy to solve. Deref should check instance? Future on the value if IDeref fails, and crash with a better error message. One instance? check is well worth the effort.

Backtick really is for macros only

The backtick notation is found in many lisps, and this includes Clojure. It is primary use is in macros because it lets you build up forms programmatically, but have them look like normal typed in forms. Compare these two:

 (defmacro inc2 [x] (+ ~x 2)) (defmacro inc2 [x] (list + x 2))

In many lisps, though, the backtick is just a list creation macro, that happens to be mostly used for macros. In clojure, it’s been hard coded for macros. Consider:

 (let [x 'john] (~x paul george ringo))

You might expect this to just return a list of four symbols (which it does), but the symbols are not what you might expect.

 (john user/paul user/george user/ringo)

The symbols paul, george and ringo get namespace qualified in the return value even though they are not in the original form. Now, of course, there is a good reason for this; it helps to prevent us from accidental capture of symbols. All symbols should be qualified or gensym’d.

But consider this:

 (deftype bob [] java.lang.Runnable (run [this] (println "Hello")))

Now, I know this is a silly example, because bob is just implementing Runnable, and any function would do this, but Runnable is nice and simple. This is still quite a lot of typing, so, perhaps we should macro this.

 (defmacro defrunnable[name body] (deftype ~name [] java.lang.Runnable (run [this] ~body)))

Unfortunately, this is actually wrong because the symbols run and this get namespace qualified, so we end up with user/run and user/this. The correct way to achieve this is this:

 (defmacro defrunnable[name body] (deftype ~name [] java.lang.Runnable (~'run [~'this] ~body)))

Now, this version is anaphoric and introduces this, so perhaps it is not ideal, but run although it looks like a funtion is not one — it’s a lexical symbol that Clojure translates to the method name.

Whitespace

In Clojure , is whitespace. Effectively, it is used to make code pretty but has no meaning other than that. Those coming from other Lisps will sooner or later do something like this:

 (defmacro defrunnable[name body] (deftype ,name [] java.lang.Runnable (,'run [,'this] ,body)))

This nearly always results in a strange error message somewhere down the line which is not easy to debug. The point is that other lisps use , to mean “unquote” for which Clojure uses ~. Not really Clojure’s fault this one, I guess. But irritating none the less.

Running in Java

One of the most unfortunate things about Clojure is that it’s hosted on the JVM. Of course, this is also the reason that I am using it, so I guess it makes no sense to complain, except when writing a article of “gotchas”. But being hosted on the JVM means Clojure inherits some of the strangeness of the JVM.

While writing Protege-NREPL, I had to struggle with the an OSGi and Clojure’s dynamic ClassLoader both of which do sort of the same thing, but sort of differently. It’s while getting this to work that I found that Clojure uses the context class loader.

In the end, I found that I needed this code to get anything working:

 private final ClassLoader cl = new DynamicClassLoader(this.getClass().getClassLoader()); Thread.currentThread().setContextClassLoader(cl);

No one understand what the context class loader is, nor what it is for. There again, no one understands class loaders, so this is perhaps not a surprise.

Two times

Clojure uses what is effectively a two-pass compilation step. I say effectively, because apparently it doesn’t but the practical upshot is that you have to declare things before you use them. This is just a pain.

A related problem is that Clojure dislikes circular namespace dependencies. With Tawny-OWL, this means that the main namespace is not really in the order that I want it. And it was a big problem for the reasoner namespace. The problem is that the reasoner namespace has to know about the owl.clj namespace; but, also, the reasoner namespace has to know when an ontology is removed (so that any reasoners can be dropped). The obvious solution which is to have the owl.clj call reasoner.clj doesn’t work because we now have a circular dependency.

In the end, I solved this by implementing a hook system like Emacs. Now owl.clj just runs a hook. Probably, I should reimplement this directly with watches, but they were alpha at the time.

Goodbye Cons

One of the big wins for Clojure is built over abstractions, so that cons cell which is the core of most lisps is gone. Instead of this, we have ISeq which is an interface and looks like this:

 Object first(); ISeq next(); ISeq more(); ISeq cons(Object o);

The problem is that it really does look like this; I mean, this is a cut-and-paste from the code. Aside these method declarations, that’s is. Nothing at all in the way of documentation.

Worse the entire API for Clojure consists of two classes, with the rest being considered “implementation detail”.

Strictly, therefore, Clojure is built over abstractions, but users of Clojure have no access to extend these abstractions themselves, unless they use implementation detail. Which, of course, they do; to access the heart of the language you have to. Given this reality, some documentation would be nice!

Conclusions

Clojure is a nice language, but in some parts it is still a little immature; some of these gotchas will disappear in time. The error message about Future’s is trivial to fix, for instance. Some of them already can be avoided with libraries: for example, the backtick issue can be avoided using an alternative implementation. Others, will I think, stick. Symbols will remain functions I suspect. The last issue, that of a public API, must be fixed if Clojure is to mature.

One gotcha I don’t mention is the lack of a type-system. There are many times when programming Clojure when I have created a bug that a type-system would have picked up instantly. This must, however, be set against those times when you stare at the screen in depression trying to work out why a perfectly innocuous piece of code will not compile. In the end, it’s often easier to debug running code, than it is to fix a broken type error. Both forms of problem are something you learn to live with, depending on your choice of language.

1. Ryan says:

You can prevent the infinite sequences from printing out by setting the dynamic variable *print-length*, as discussed in this post: http://blog.n01se.net/blog-n01se-net-p-85.html

If you use leiningen I recommend setting this permanently in your ~/.lein/profiles.clj file as mentioned in this SO post: http://stackoverflow.com/questions/14664486/whats-the-correct-way-of-providing-a-default-value-for-print-length-in-profil

2. Phillip Lord says:

I’ve been meaning to do this for ages but never got around to it; I rarely dump infinite sequences to the REPL these days. For the life of me, though, I don’t know why this isn’t the default.

3. Alex Miller says:

A few points of info…

– Lazy lists – I think many Clojure programmers go through a period of adjustment to the laziness of sequences until the mental model is internalized (I did for sure). I’d actually point at the use of dynamic vars (which are effectively global state), rather than laziness as the more important source of the problem.

And I’d echo *print-length* as other posters did – the reason it’s not the default in Clojure is that you’d be mighty annoyed when you tried to print a longer sequence and didn’t get all of your results. But I think it should be set by default on repls.

– Lisp-1 vs Lisp-2 – this is just a design decision, so not sure I can offer much. You can of course exclude core or rename core Clojure symbols. See http://www.paradiso.cc/2014/05/10/clojure-namespaces/ for some examples near the end.

– Everything is a function – for the specific get case, there is a ticket to throw an error in the case where a lookup is being attempted against a non-associative object. Maybe a 1.7 thing… http://dev.clojure.org/jira/browse/CLJ-1107

– Future, what’s a Future – sounds worth a ticket – please file one to make this error case better.

– Backtick, whitespace – again, this just seems like Clojure is different than what you’re used to?

– Running in Java – the thread context classloader is simply the implicit classloader to use for classloading if no explicit classloader is provided. It’s common to need to set this when using frameworks that employ custom classloader strategies (like OSGi).

– Two times – I’d say Clojure uses one pass which is why you have to declare things before you use them. In general, I’d say that circular namespaces are a bad idea and a sign that the code should be refactored. There are usually a number of ways to avoid it.

– Cons – the “API” you mention is only the API for calling Clojure from Java. If you are in Clojure, then there are many abstractions (at varying levels of power) that you can use. For example, you can create a custom type that implements the proper Clojure interfaces using deftype and the entire collection and sequence library will treat them just like the core collections. You have way more access to the underlying abstractions than most libraries. I will agree that the docs are … sparse. :)

4. Phillip Lord says:

“Lazy lists – I’d actually point at the use of dynamic vars (which are
effectively global state), rather than laziness as the more important source
of the problem….But I think it should be set by default on repls.”

In a sense you are right. But dynamic vars are useful in many languages and
laziness makes them less so. My use case, incidentally, also involves using
Java with side-effects, so it’s not just dynamic vars. Agreed on
*print-length* (that would be a dynamic var right?).

“Lisp-1 vs Lisp-2 – this is just a design decision”

Yes, and not having to say do (funcall f x y) is nice. The gotcha here was an
unexpected one, though, which is don’t use a parameter with the same name as a
global, even if you haven’t created the function yet!

“Future, what’s a Future – sounds worth a ticket – please file one to make
this error case better.”

Will do!

“Backtick, whitespace – again, this just seems like Clojure is different than
what you’re used to?”

To use Clojure speak, backtick is complected. Not a bad design decision, but
it it nice to have an unwound option. Comma whitespace is, I think, a mistake.
I would hazard a guess that the only place people use it is in a map literal,
to separate key-val pairs. It would serve this purpose better if it were
(optional) syntax and only valid at the right point.

“Running in Java – the thread context classloader is simply the implicit

Nothing to do with classloaders is simple. If it is implicit, why did I have
to set it explicitly? Still, classloaders are outside Clojure’s control. The
JVM is overwhelmingly a plus point for Clojure. Classloaders are one of the
prices to pay.

“In general, I’d say that circular namespaces are a bad idea and a sign that
the code should be refactored.”

Let me answer this with reference to another language. java.lang and java.io
reference each other — String implements Serializable for instance. Do you
think this is bad, and how would you refactor it?

“For example, you can create a custom type that implements the proper Clojure
interfaces using deftype and the entire collection and sequence library will
treat them just like the core collections…”

Yes, I have been trying that, just for fun. For instance, here is my
implementation of IPersistentSet.

https://github.com/phillord/small/blob/master/src/small/set.clj

But remember this from the API documentation.

“The clojure.lang package holds the implementation for Clojure. The only class
considered part of the public API is IFn. All other classes should be
considered implementation details.”

Is there a way to redo my set implementation *without* referencing
IPersistentSet which is implementation detail?

“I will agree that the docs are … sparse. :)”

I should have put docs as a separate one. The .clj docs are not fantastic, but
the interface docs are unforgivable.

5. Alex Miller says:

“If it is implicit, why did I have to set it explicitly?” – because used a framework (OSGi) with a custom classloader scheme.

“Let me answer this with reference to another language. java.lang and java.io reference each other — String implements Serializable for instance. Do you think this is bad, and how would you refactor it?” – I don’t think this is a fair or answerable question given the example you picked (String is not your average class in Java). I do know that I’ve managed to write Java and Clojure for a long time without ever using a circular namespace/package, usually by introducing interfaces or protocols as a point of indirection.

Re the API documentation – the comments you refer to are in the context of calling Clojure from Java. When you are “inside” Clojure, there are layers of API that have different levels of stability and power. The implementation details are intentionally left open for advanced or power users to take advantage of them (with the corresponding caveats that if you tap in too low, you may have to deal with changes later).

Different levels:
1) documented clojure.core functions – these are considered public and official.
2) undocumented or private clojure.core functions – considered to be implementation details of clojure.core. They may change without warning.
3) Clojure’s Java interfaces (like IPersistentSet, Countable, etc) – generally in Clojure you detect these via functions like set? and counted?. Clojure itself and the clojure.core library are built on top of these abstractions. We expect you to use them to build extensions with deftype or custom Java impls. Many examples exist.
4) The RT class is used by the Clojure runtime and calls to it are compiled into AOT classes. Generally, we try to avoid breaking compatibility at this interface as that breaks older AOT classes.
5) Clojure’s Java impls – generally you should expect the internals of these to be up for arbitrary change.

Above, #1,3,4 are very stable and we carefully consider breaking changes in any of these. #2,5 are considered open to arbitrary changes (although if a change there broke lots of stuff we would certainly reconsider it).

6. Phillip Lord says:

“I do know that I’ve managed to write Java and Clojure for a long time without
ever using a circular namespace/package, usually by introducing interfaces or
protocols as a point of indirection.”

This is what I have done in Clojure as well. But I dislike being forced to
introduce a layer of indirection when I only have one place where I am using
it. Now in the specific example that I gave above, I have ended up using it
elsewhere, so it worked out okay. But I dislike being forced to be the
language semantics. So this does qualify as a “gotcha” — I had to deal with
issues earlier than I should have.

“We expect you to use them to build extensions with deftype or custom Java
impls. Many examples exist.”

Yes, I know. But the documentation says

“The clojure.lang package holds the implementation for Clojure. The only class
considered part of the public API is IFn. All other classes should be
considered implementation details.”

We already agree that the documentation is sparse. As you say, many examples
exist (of people using ISeq or so on), which suggests to me that the
documentation is also wrong in practice. ISeq is, effectively, public as well
as IFn because, without it, Clojure loses a lot.