How to Get the Length of Generators and Iterators in Python
Generators and iterators in Python don't have a built-in len()
function. This is because they produce values on demand, and their length might not be known in advance (or might even be infinite).
This guide explains how to determine the "length" (number of items) of a generator or iterator, if it's finite, while emphasizing the crucial point that this exhausts the iterator.
Understanding Why len()
Doesn't Work Directly
Generators and iterators are designed for lazy evaluation. They produce values one at a time, only when requested. This is memory-efficient, especially for large or infinite sequences. Because they don't store all values in memory simultaneously, there's no way to know their length without consuming them.
def my_generator():
yield 1
yield 2
yield 3
gen = my_generator()
# len(gen) # TypeError: object of type 'generator' has no len()
Getting the Length by Exhausting the Iterator/Generator
If you must know the length of a finite generator or iterator, and you're willing to consume it, you have two main options:
Using sum(1 for _ in ...)
(Recommended)
This is the most efficient and Pythonic way to count the items:
def g():
yield 1
yield 2
yield 3
gen = g()
result = sum(1 for _ in gen) # Count the items
print(result) # Output: 3
print(list(gen)) # Output [] (The generator is now exhausted)
- We use the
sum()
function with a generator expression to determine the length of the generator. sum(1 for _ in gen)
: This is a generator expression within asum()
call.for _ in gen
: This iterates through the generator (or iterator)gen
. We use_
because we don't care about the actual values yielded.1 for _ in gen
: For each item yielded by the generator, this produces the value1
.sum(...)
: This sums up all the1
s, effectively counting the number of items yielded by the generator.
- Crucial: After this,
gen
is exhausted. You can't iterate over it again. This is the major trade-off of getting the length of a generator/iterator.
Using list()
(Consumes More Memory)
You can convert the generator/iterator to a list and then get the length:
def g():
yield 1
yield 2
yield 3
gen = g()
result = len(list(gen)) # Convert to a list first.
print(result) # Output: 3
print(list(gen)) # Output: [] The generator is now exhausted.
list(gen)
: This creates a list containing all the elements from the generator. This consumes the generator and uses potentially much more memory than thesum()
approach. Avoid this if your generator produces a very large (or infinite) number of items.- This approach has the same result as using
sum()
with a generator.
When You Shouldn't Get the Length
- Infinite Generators: If your generator produces an infinite sequence, never try to get its length. Your program will hang (or run out of memory).
- Single-Use Iterators: Remember, getting the length consumes the iterator. If you need to iterate over the values after getting the length, you'll need to create a new iterator, or store the values in another way (like converting to a list first, if memory allows).
- When you don't actually need it: Consider if there is another approach to solve your problem that does not involve needing to know the length of a generator or iterator.