Phillippe Siclait

How does this lazy sequence work?

7 min read

I explore a succinct, corecursive definition for the sequence of prime numbers in Clojure—and how it works.

In the comments of a recent Stack Overflow question, one user proposed a succinct expression for generating prime numbers in Clojure using trial division. There exist more efficient algorithms for generating prime numbers, but this one’s brevity makes it interesting.

(def primes (remove
              (fn [x]
                (some #(zero? (mod x %)) primes))
              (iterate inc 2)))

This definition uses lazy evaluation to create an infinite sequence that only calculates prime numbers when they are needed. For example, we can use this sequence to find the first ten primes,

user=> (take 10 primes)
(2 3 5 7 11 13 17 19 23 29)

or we can use it to find the 10,000th prime,

user=> (nth primes (dec 10000))
104729

and in each case, only the primes that we need have been calculated.

When I first saw this definition, I was surprised that it worked. The predicate that we pass to the remove function iterates over every element in primes to determine if x is prime. But if that’s the case, then how does this expression terminate? It must be that the inner reference to primes only contains the primes we’ve seen so far. But why?

My first thought was to trace through the code using the CIDER debugger. Unfortunately, when I tried stepping through the code with the debugger, the process froze. I’m not sure why this happened, but I’d like to explore it in the future.

Given that the debugger failed, I decided to use my mental debugger and trace the code by reading it. This led me into the code for various core library functions—remove, some, filter, lazy-seq—as well as the Java code for the clojure.lang.LazySeq class. Reading the code for the lazy-seq macro and the LazySeq class gave me a better understanding of lazy sequences in Clojure and this corecursive definition for primes. Now, I’ll lead us through this code to explain why it works.

The lazy-seq macro is simple.

(defmacro lazy-seq
  "Takes a body of expressions that returns an ISeq or nil, and yields
  a Seqable object that will invoke the body only the first time seq
  is called, and will cache the result and return it on all subsequent
  seq calls. See also - realized?"
  {:added "1.0"}
  [& body]
  (list 'new 'clojure.lang.LazySeq (list* '^{:once true} fn* [] body)))

Given a definition for a lazy sequence such as

(defn integers
  ([]
    (integers 1))
  ([n]
    (lazy-seq (cons n (integers (inc n))))))

The macro expands a call to (integers) into,

(new clojure.lang.LazySeq
  (^:once fn* [] (cons 1 (integers (inc 1)))))

which creates an instance of LazySeq with a single functon argument.

A LazySeq is an object with three fields,

private IFn fn;
private Object sv;
private ISeq s;

fn is the function argument that returns the value of the realized sequence. A common pattern for lazy sequences is for this returned value to be a cons of the next value of the sequence and another lazy sequence containing the rest. For example, in the above definition for a lazy sequence of integers from 11 to \infty, the function argument fn is defined

(^:once fn* [] (cons 1 (integers (inc 1))))

sv is the result of calling fn. This intermediate value is a sequence that may or may not be another lazy sequence.

s is the cached result of calling seq on the lazy sequence.

This object goes through a series of states as the values for these variables are set and unset. A lazy sequence is realized after seq has been called on it. Internally this is represented by setting the fn field of the lazy sequence to null.

When the integers sequence above is realized, its cached sequence value s is equal to the return value of the function argument,

(cons 1 (integers (inc 1)))

Now, let’s walk through the definition of primes to see why the inner reference to primes is a sequence containing the prime numbers seen so far.

When primes is defined, a LazySeq object is created. On calling a function to realize the values of primes, such as take, seq is called on this lazy sequence. Tracing the seq call, we eventually get to line 3 of the following code snippet where we call the underlying fn that returns the sequence.

final synchronized Object sval() {
    if (fn != null) {
        sv = fn.invoke();
        fn = null;
    }
    if (sv != null)
        return sv;
    return s;
}

This call eventually returns something that looks like,

(cons 2 (lazy-seq …))

but not until we’ve determined that 22 is a prime number. To determine that, we need the result of calling the following predicate on 22,

(fn [x]
  (some #(zero? (mod x %)) primes))

At this point, primes has not been realized—meaning that line 4 from the above definition of sval has not been executed. This can be verified by printing out the return value of (realized? primes) in the above anonymous predicate function, before the call to some. In the same way, we can show that invoking the fn field of the LazySeq returns null the first time this predicate is called. The first time the predicate is called is the second time fn is called—the first was when we initially called seq on primes. Why is the result null on this second invocation of fn?

If we go back to the lazy-seq macro, we see that the function argument is created using ^{:once true} fn* instead of a simple fn. As mentioned in the blog post “Macros, closures and unexpected object retention” by Christophe Grand, the :once metadata flag tells the compiler that the function will be executed once and that closed-over objects can be garbage collected. Since the predicate is invoking fn for the second time, some of our closed-over objects are now nil. This includes our lazy sequence of integers from 22 to \infty. We can see this at work in the following example,

user=> (let [f (let [i (iterate inc 2)]
                 (^:once fn* [] (first i)))]
           [(f) (f)])
[2 nil]

On the first invocation, (f) returns 2, while on the second it returns nil.

Now, we simply need to explain how this property of fn results in the inner reference to primes being a sequence of the primes seen so far. Let’s first consider the base case of determining that 22 is prime, then we can consider the case where primes is a non-empty sequence. Simplifying a bit, the function argument fn can be written,

(^:once fn* []
  (when-let [s (seq coll)]
    …
    (let [f (first s) r (rest s)]
      (if (pred f)
        (cons f (filter pred r))
        (filter pred r)))))

In this function, coll is the sequence of integers that we test for primality and pred is the predicate that checks if some integer x is prime. If coll is nil, which it will be on the second invocation of fn, then the function just returns nil. This sets sv to null on line 3 in the definition of sval. Coming back up to the seq function in clojure.lang.LazySeq,

final synchronized public ISeq seq() {
    sval();
    if (sv != null) {
        Object ls = sv;
        sv = null;
        while (ls instanceof LazySeq) {
            ls = ((LazySeq)ls).sval();
        }
        s = RT.seq(ls);
    }
    return s;
}

we see that the condition on line 3 is false and that on line 11 the function returns the default value of s, which is null, representing an empty sequence. Repeating the earlier definition of primes,

(def primes (remove
              (fn [x]
                (some #(zero? (mod x %)) primes))
              (iterate inc 2)))

we see that if primes is an empty sequence, then the predicate returns false, and 22 should not be removed (it is prime). When this predicate is called again on some value n>2n > 2, primes is traversed normally until it reaches the end of the sequence. At the end of the sequence, we again find ourselves with an instance of LazySeq and a noninitial invocation of fn. By the same argument as above, the lazy sequence resolves to null, marking the end of the sequence.

So, why does the inner reference to primes terminate with the primes we’ve seen so far? Because the :once metadata flag in the function argument to the lazy sequence makes the collection that we're filtering nil in noninitial evaluations of the function argument.

I’m @siclait on Twitter—reach out to continue the conversation.