# How does this lazy sequence work?

3 June 2020 • 7 min read

I explore a succinct, corecursive definition for the sequence of prime numbers in Clojure—and how it works.

In the comments of a recent Stack Overflow question, one user proposed a succinct expression for generating prime numbers in Clojure using trial division. There exist more efficient algorithms for generating prime numbers, but this one’s brevity makes it interesting.

```
(def primes (remove
(fn [x]
(some #(zero? (mod x %)) primes))
(iterate inc 2)))
```

This definition uses lazy evaluation to create an infinite sequence that only calculates prime numbers when they are needed. For example, we can use this sequence to find the first ten primes,

```
user=> (take 10 primes)
(2 3 5 7 11 13 17 19 23 29)
```

or we can use it to find the 10,000th prime,

```
user=> (nth primes (dec 10000))
104729
```

and in each case, only the primes that we need have been calculated.

When I first saw this definition, I was surprised that it worked. The predicate that we pass to the `remove`

function iterates over every element in `primes`

to determine if `x`

is prime. But if that’s the case, then how does this expression terminate? It must be that the inner reference to `primes`

only contains the primes we’ve seen so far. But why?

My first thought was to trace through the code using the CIDER debugger. Unfortunately, when I tried stepping through the code with the debugger, the process froze. I’m not sure why this happened, but I’d like to explore it in the future.

Given that the debugger failed, I decided to use my mental debugger and trace the code by reading it. This led me into the code for various core library functions—`remove`

, `some`

, `filter`

, `lazy-seq`

—as well as the Java code for the `clojure.lang.LazySeq`

class. Reading the code for the lazy-seq macro and the LazySeq class gave me a better understanding of lazy sequences in Clojure and this corecursive definition for primes. Now, I’ll lead us through this code to explain why it works.

The `lazy-seq`

macro is simple.

```
(defmacro lazy-seq
"Takes a body of expressions that returns an ISeq or nil, and yields
a Seqable object that will invoke the body only the first time seq
is called, and will cache the result and return it on all subsequent
seq calls. See also - realized?"
{:added "1.0"}
[& body]
(list 'new 'clojure.lang.LazySeq (list* '^{:once true} fn* [] body)))
```

Given a definition for a lazy sequence such as

```
(defn integers
([]
(integers 1))
([n]
(lazy-seq (cons n (integers (inc n))))))
```

The macro expands a call to `(integers)`

into,

```
(new clojure.lang.LazySeq
(^:once fn* [] (cons 1 (integers (inc 1)))))
```

which creates an instance of `LazySeq`

with a single functon argument.

A `LazySeq`

is an object with three fields,

```
private IFn fn;
private Object sv;
private ISeq s;
```

`fn`

is the function argument that returns the value of the realized sequence. A common pattern for lazy sequences is for this returned value to be a `cons`

of the next value of the sequence and another lazy sequence containing the rest. For example, in the above definition for a lazy sequence of integers from $1$ to $\infty$, the function argument `fn`

is defined

```
(^:once fn* [] (cons 1 (integers (inc 1))))
```

`sv`

is the result of calling `fn`

. This intermediate value is a sequence that may or may not be another lazy sequence.

`s`

is the cached result of calling `seq`

on the lazy sequence.

This object goes through a series of states as the values for these variables are set and unset. A lazy sequence is **realized** after `seq`

has been called on it. Internally this is represented by setting the `fn`

field of the lazy sequence to null.

When the `integers`

sequence above is realized, its cached sequence value `s`

is equal to the return value of the function argument,

```
(cons 1 (integers (inc 1)))
```

Now, let’s walk through the definition of `primes`

to see why the inner reference to `primes`

is a sequence containing the prime numbers seen so far.

When `primes`

is defined, a `LazySeq`

object is created. On calling a function to realize the values of `primes`

, such as `take`

, `seq`

is called on this lazy sequence. Tracing the `seq`

call, we eventually get to line 3 of the following code snippet where we call the underlying `fn`

that returns the sequence.

```
final synchronized Object sval() {
if (fn != null) {
sv = fn.invoke();
fn = null;
}
if (sv != null)
return sv;
return s;
}
```

This call eventually returns something that looks like,

```
(cons 2 (lazy-seq …))
```

but not until we’ve determined that $2$ is a prime number. To determine that, we need the result of calling the following predicate on $2$,

```
(fn [x]
(some #(zero? (mod x %)) primes))
```

At this point, `primes`

has not been realized—meaning that line 4 from the above definition of `sval`

has not been executed. This can be verified by printing out the return value of `(realized? primes)`

in the above anonymous predicate function, before the call to `some`

. In the same way, we can show that invoking the `fn`

field of the `LazySeq`

returns null the first time this predicate is called. The first time the predicate is called is the *second* time `fn`

is called—the first was when we initially called `seq`

on `primes`

. Why is the result null on this second invocation of `fn`

?

If we go back to the `lazy-seq`

macro, we see that the function argument is created using `^{:once true} fn*`

instead of a simple `fn`

. As mentioned in the blog post “Macros, closures and unexpected object retention” by Christophe Grand, the `:once`

metadata flag tells the compiler that the function will be executed once and that closed-over objects can be garbage collected. Since the predicate is invoking `fn`

for the second time, some of our closed-over objects are now `nil`

. This includes our lazy sequence of integers from $2$ to $\infty$. We can see this at work in the following example,

```
user=> (let [f (let [i (iterate inc 2)]
(^:once fn* [] (first i)))]
[(f) (f)])
[2 nil]
```

On the first invocation, `(f)`

returns `2`

, while on the second it returns `nil`

.

Now, we simply need to explain how this property of `fn`

results in the inner reference to `primes`

being a sequence of the primes seen so far. Let’s first consider the base case of determining that $2$ is prime, then we can consider the case where `primes`

is a non-empty sequence. Simplifying a bit, the function argument `fn`

can be written,

```
(^:once fn* []
(when-let [s (seq coll)]
…
(let [f (first s) r (rest s)]
(if (pred f)
(cons f (filter pred r))
(filter pred r)))))
```

In this function, `coll`

is the sequence of integers that we test for primality and `pred`

is the predicate that checks if some integer `x`

is prime. If `coll`

is `nil`

, which it will be on the second invocation of `fn`

, then the function just returns `nil`

. This sets `sv`

to null on line 3 in the definition of `sval`

. Coming back up to the `seq`

function in `clojure.lang.LazySeq`

,

```
final synchronized public ISeq seq() {
sval();
if (sv != null) {
Object ls = sv;
sv = null;
while (ls instanceof LazySeq) {
ls = ((LazySeq)ls).sval();
}
s = RT.seq(ls);
}
return s;
}
```

we see that the condition on line 3 is false and that on line 11 the function returns the default value of `s`

, which is null, representing an empty sequence. Repeating the earlier definition of primes,

```
(def primes (remove
(fn [x]
(some #(zero? (mod x %)) primes))
(iterate inc 2)))
```

we see that if `primes`

is an empty sequence, then the predicate returns `false`

, and $2$ should not be removed (it is prime). When this predicate is called again on some value $n > 2$, `primes`

is traversed normally until it reaches the end of the sequence. At the end of the sequence, we again find ourselves with an instance of `LazySeq`

and a noninitial invocation of `fn`

. By the same argument as above, the lazy sequence resolves to null, marking the end of the sequence.

So, why does the inner reference to `primes`

terminate with the primes we’ve seen so far? Because the `:once`

metadata flag in the function argument to the lazy sequence makes the collection that we're filtering `nil`

in noninitial evaluations of the function argument.

I’m @siclait on Twitter—reach out to continue the conversation.