How does this lazy sequence work?
• 8 min read
I explore a succinct, corecursive definition for the sequence of prime numbers in Clojure—and how it works.
In the comments of a recent Stack Overflow question, one user proposed a succinct expression for generating prime numbers in Clojure using trial division. There exist more efficient algorithms for generating prime numbers, but this one’s brevity makes it interesting.
(def primes (remove
(fn [x]
(some #(zero? (mod x %)) primes))
(iterate inc 2)))
This definition uses lazy evaluation to create an infinite sequence that only calculates prime numbers when they are needed. For example, we can use this sequence to find the first ten primes,
user=> (take 10 primes)
(2 3 5 7 11 13 17 19 23 29)
or we can use it to find the 10,000th prime,
user=> (nth primes (dec 10000))
104729
and in each case, only the primes that we need have been calculated.
When I first saw this definition, I was surprised that it worked. The predicate that we pass to the remove
function iterates over every element in primes
to determine if x
is prime. But if that’s the case, then how does this expression terminate? It must be that the inner reference to primes
only contains the primes we’ve seen so far. But why?
My first thought was to trace through the code using the CIDER debugger. Unfortunately, when I tried stepping through the code with the debugger, the process froze. I’m not sure why this happened, but I’d like to explore it in the future.
Given that the debugger failed, I decided to use my mental debugger and trace the code by reading it. This led me into the code for various core library functions—remove
, some
, filter
, lazy-seq
—as well as the Java code for the clojure.lang.LazySeq
class. Reading the code for the lazy-seq macro and the LazySeq class gave me a better understanding of lazy sequences in Clojure and this corecursive definition for primes. Now, I’ll lead us through this code to explain why it works.
The lazy-seq
macro is simple.
(defmacro lazy-seq
"Takes a body of expressions that returns an ISeq or nil, and yields
a Seqable object that will invoke the body only the first time seq
is called, and will cache the result and return it on all subsequent
seq calls. See also - realized?"
{:added "1.0"}
[& body]
(list 'new 'clojure.lang.LazySeq (list* '^{:once true} fn* [] body)))
Given a definition for a lazy sequence such as
(defn integers
([]
(integers 1))
([n]
(lazy-seq (cons n (integers (inc n))))))
The macro expands a call to (integers)
into,
(new clojure.lang.LazySeq
(^:once fn* [] (cons 1 (integers (inc 1)))))
which creates an instance of LazySeq
with a single functon argument.
A LazySeq
is an object with three fields,
private IFn fn;
private Object sv;
private ISeq s;
fn
is the function argument that returns the value of the realized sequence. A common pattern for lazy sequences is for this returned value to be a cons
of the next value of the sequence and another lazy sequence containing the rest. For example, in the above definition for a lazy sequence of integers from to , the function argument fn
is defined
(^:once fn* [] (cons 1 (integers (inc 1))))
sv
is the result of calling fn
. This intermediate value is a sequence that may or may not be another lazy sequence.
s
is the cached result of calling seq
on the lazy sequence.
This object goes through a series of states as the values for these variables are set and unset. A lazy sequence is realized after seq
has been called on it. Internally this is represented by setting the fn
field of the lazy sequence to null.
When the integers
sequence above is realized, its cached sequence value s
is equal to the return value of the function argument,
(cons 1 (integers (inc 1)))
Now, let’s walk through the definition of primes
to see why the inner reference to primes
is a sequence containing the prime numbers seen so far.
When primes
is defined, a LazySeq
object is created. On calling a function to realize the values of primes
, such as take
, seq
is called on this lazy sequence. Tracing the seq
call, we eventually get to line 3 of the following code snippet where we call the underlying fn
that returns the sequence.
final synchronized Object sval() {
if (fn != null) {
sv = fn.invoke();
fn = null;
}
if (sv != null)
return sv;
return s;
}
This call eventually returns something that looks like,
(cons 2 (lazy-seq …))
but not until we’ve determined that is a prime number. To determine that, we need the result of calling the following predicate on ,
(fn [x]
(some #(zero? (mod x %)) primes))
At this point, primes
has not been realized—meaning that line 4 from the above definition of sval
has not been executed. This can be verified by printing out the return value of (realized? primes)
in the above anonymous predicate function, before the call to some
. In the same way, we can show that invoking the fn
field of the LazySeq
returns null the first time this predicate is called. The first time the predicate is called is the second time fn
is called—the first was when we initially called seq
on primes
. Why is the result null on this second invocation of fn
?
If we go back to the lazy-seq
macro, we see that the function argument is created using ^{:once true} fn*
instead of a simple fn
. As mentioned in the blog post “Macros, closures and unexpected object retention” by Christophe Grand, the :once
metadata flag tells the compiler that the function will be executed once and that closed-over objects can be garbage collected. Since the predicate is invoking fn
for the second time, some of our closed-over objects are now nil
. This includes our lazy sequence of integers from to . We can see this at work in the following example,
user=> (let [f (let [i (iterate inc 2)]
(^:once fn* [] (first i)))]
[(f) (f)])
[2 nil]
On the first invocation, (f)
returns 2
, while on the second it returns nil
.
Now, we simply need to explain how this property of fn
results in the inner reference to primes
being a sequence of the primes seen so far. Let’s first consider the base case of determining that is prime, then we can consider the case where primes
is a non-empty sequence. Simplifying a bit, the function argument fn
can be written,
(^:once fn* []
(when-let [s (seq coll)]
…
(let [f (first s) r (rest s)]
(if (pred f)
(cons f (filter pred r))
(filter pred r)))))
In this function, coll
is the sequence of integers that we test for primality and pred
is the predicate that checks if some integer x
is prime. If coll
is nil
, which it will be on the second invocation of fn
, then the function just returns nil
. This sets sv
to null on line 3 in the definition of sval
. Coming back up to the seq
function in clojure.lang.LazySeq
,
final synchronized public ISeq seq() {
sval();
if (sv != null) {
Object ls = sv;
sv = null;
while (ls instanceof LazySeq) {
ls = ((LazySeq)ls).sval();
}
s = RT.seq(ls);
}
return s;
}
we see that the condition on line 3 is false and that on line 11 the function returns the default value of s
, which is null, representing an empty sequence. Repeating the earlier definition of primes,
(def primes (remove
(fn [x]
(some #(zero? (mod x %)) primes))
(iterate inc 2)))
we see that if primes
is an empty sequence, then the predicate returns false
, and should not be removed (it is prime). When this predicate is called again on some value , primes
is traversed normally until it reaches the end of the sequence. At the end of the sequence, we again find ourselves with an instance of LazySeq
and a noninitial invocation of fn
. By the same argument as above, the lazy sequence resolves to null, marking the end of the sequence.
So, why does the inner reference to primes
terminate with the primes we’ve seen so far? Because the :once
metadata flag in the function argument to the lazy sequence makes the collection that we're filtering nil
in noninitial evaluations of the function argument.
I’m @siclait on Twitter—reach out to continue the conversation.