How to write a generator expression

Generator Expressions

List comprehensions make new lists. Generator expressions make new generator objects. Generators are iterators, which are lazy single-use iterables. Unlike lists, generators aren't data structures. Instead they do work as you loop over them.

0%
Watch other topic trails

Transcript

Let's make a generator expression.

Writing a generator expression

Here we have a list and a list comprehension that loops over that list:

>>> numbers = [2, 1, 3, 4, 7, 11, 18]
>>> squares = [n**2 for n in numbers]

If we turn the square brackets ([ and ]) in that list comprehension into parentheses (( and )):

>>> squares = (n**2 for n in numbers)

This will turn our list comprehension into a generator expression.

List comprehensions give us back new lists. Generator expressions give us back new generator objects:

>>> squares
<generator object <genexpr> at 0x7fcb363347b0>

A generator object, unlike a list, doesn't have a length:

>>> len(squares)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
TypeError: object of type 'generator' has no len()

If we try to index a generator object, to get its first item for example, we'll get an error:

>>> squares[0]
Traceback (most recent call last):
  File "<console>", line 1, in <module>
TypeError: 'generator' object is not subscriptable

You cannot index a generator.

The only thing we can really do with a generator is loop over it:

>>> for n in squares:
...     print(n)
...
4
1
9
16
49
121
324

It seems like generators have fewer features than lists. So why would we even want to use a generator expression?

Why use generators?

The benefit of generators is that they are lazy iterables, meaning they don't do work until you start looping over them.

Right after we evaluate a generator expression a generator object will be made:

>>> squares = (n**2 for n in numbers)
>>> squares
<generator object <genexpr> at 0x7fd49a500900>

But upto this point this generator hasn't actually computed anything. It doesn't contain any values, unlike a list.

So if we change the number 4 in our list (at index 3) to the number 5:

>>> numbers
[2, 1, 3, 4, 7, 11, 18]
>>> numbers[3] = 5
>>> numbers
[2, 1, 3, 5, 7, 11, 18]

And then we loop over our generat object (using a list constructor, for loop, or any other form of looping) we'll see that the fourth item isn't 16, it's 25:

>>> list(squares)
[4, 1, 9, 25, 49, 121, 324]

Generators don't do work until the point that they're looped over.

And if you loop over a generator a second time it'll be empty:

>>> list(squares)
[]

Generator objects are lazy iterables and they are single-use iterables. Items are generated as we loop over a generator (that's what makes them lazy) and these items are consumed as we loop over the generator, meaning they aren't stored anywhere (that's what makes them single-use).

Looping part-way over a generator

When all the items in a generator have been consumed (meaning we've fully looped-over it) we say that it's exhausted. That squares generator above was exhausted:

>>> list(squares)
[]

You don't necessarily need to fully exhaust generators as you loop over them. If we were to start looping over a generator and then we stopped once a condition was met (n > 10 below):

>>> numbers = [2, 1, 3, 4, 7, 11, 18]
>>> squares = (n**2 for n in numbers)
>>> for n in squares:
...     print(n)
...     if n > 10:
...         break
...
4
1
9
16

If we then started looping again (using the list constructor in this case) our generator would start up where it left off before:

>>> list(squares)
[49, 121, 324]

Generators generate values as you loop over them.

Generator expressions are a comprehension-like syntax for creating new generator objects.

The only thing that one can do with a generator object is loop over it. Once you've looped over a generator object completely (i.e. you've exhausted it by consuming all the items within it) it doesn't really have a use anymore. Once a generator is exhausted it's empty forever.

Generating just the next item

There is one more thing we can do with the generators (besides looping over them) though it's a little bit unusual to see. All generators can be passed to the built-in next function.

The next function gives us the next item in a generator:

>>> numbers = [2, 1, 3, 4, 7, 11, 18]
>>> squares = (n**2 for n in numbers)
>>> next(squares)
4

Generators keep track of the expression they need to evaluate on the iterable they're looping over and they keep track of where they are in the iterable.

If we call next on a generator repeatedly we'll get each individual item in the generator:

>>> next(squares)
1
>>> next(squares)
9
>>> next(squares)
16
>>> next(squares)
49
>>> next(squares)
121
>>> next(squares)
324

If we call next on a generator that's exhausted (it's been fully consumed) we'll get a StopIteration exception:

>>> next(squares)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

That StopIteration exception indicates that there are no more values in this generator (it's empty):

>>> list(squares)
[]

Summary

Just as list comprehensions make new lists, generator expressions make new generator objects.

A generator is an iterable which doesn't actually contain or store values; it generates values as you loop over it.

This means generators are more memory efficient than lists because they don't really store memory to hold their values. Instead they generate values on the fly as we loop over them.

Generator expressions give us generators which are lazy single-use iterables.