This bit of Python code often confuses newer Python programmers:
>>> numbers = [1, 2, 3, 4]
>>> squares = [n**2 for n in numbers]
>>> cubes = (n**3 for n in numbers)
>>> list(zip(squares, squares))
[(1, 1), (4, 4), (9, 9), (16, 16)]
>>> list(zip(cubes, cubes))
[(1, 8), (27, 64)]
Normally the zip
function loops over multiple iterables at once.
Passing the same iterable multiple times to zip
should reveal tuples of the same elements repeated multiple times.
But passing a generator to zip
multiple times does something different!
zip
?Python's zip function and the zip_longest function in the itertools
module both loop over multiple iterables at once.
>>> numbers = [1, 2, 3, 4]
>>> letters = ['a', 'b', 'c', 'd']
>>> list(zip(numbers, letters))
[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]
The zip
function is useful if you want to loop over 2 or more iterables at the same time.
It's a common alternative to looping with indexes in Python.
zip
worksThe zip
function works by getting an iterator from each of the given iterables and repeatedly calling next
on that iterator.
>>> numbers = [1, 2, 3, 4]
>>> letters = ['a', 'b', 'c', 'd']
>>> i1 = iter(numbers)
>>> i2 = iter(letters)
>>> next(i1)
1
>>> next(i2)
'a'
>>> next(i1), next(i2)
(2, 'b')
If you've never seen iter
and next
before, see iterators in Python.
Generators are iterators.
When you ask an iterator for an iterator (by passing it to the built-in iter
function), it'll return itself:
>>> numbers = [1, 2, 3, 4]
>>> squares = (n**2 for n in numbers) # this makes a generator, which is an iterator
>>> squares
<generator object <genexpr> at 0x7f1954adffc0>
>>> iter(squares)
<generator object <genexpr> at 0x7f1954adffc0>
>>> iter(squares) is squares
True
if you manually call iter
on an iterator twice to get two iterators from it, you'll actually just get two references to original iterator (note that all variables are references because variables in Python are all actually pointers to objects).
So calling next
on i1
, i2
, or squares
in the below code will all do the same thing: all of these variables point to the same iterator.
>>> numbers = [1, 2, 3, 4]
>>> squares = (n**2 for n in numbers) # this makes a generator, which is an iterator
>>> i1 = iter(squares)
>>> i2 = iter(squares)
>>> i1 is i2 is squares
True
>>> next(i1)
1
>>> next(i2)
4
>>> next(i1)
9
>>> next(i2)
16
If you call iter
multiple times on a non-iterator iterable, you'll get back multiple independent iterators:
>>> numbers = [1, 2, 3, 4]
>>> i1 = iter(numbers)
>>> i2 = iter(numbers)
>>> i1 is i2
False
>>> next(i1)
1
>>> next(i2)
1
>>> next(i1)
2
>>> next(i2)
2
Calling next
on i1
doesn't change the state of i2
because these variables point to two different iterators (both of which are iterating over the same numbers
list).
When two independent iterables are passed to zip
, you'll get values from each of those independent iterables zipped together into tuples:
>>> numbers = [1, 2, 3, 4]
>>> letters = ['a', 'b', 'c', 'd']
>>> list(zip(numbers, letters))
[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]
When the same (non-iterator) iterable is passed to zip
multiple times, zip
will get two independent iterators from it (by calling iter
on it twice) so you'll get the same value repeated multiple times in the resulting iterable:
>>> numbers = [1, 2, 3, 4]
>>> list(zip(numbers, numbers))
[(1, 1), (2, 2), (3, 3), (4, 4)]
The weirdness happens when the same iterator is passed to zip
multiple times.
If the zip
function gets multiple references to the same iterator, it'll continue to blindly pass each argument to the built-in iter
function which will result in multiple references to the same iterator.
As we saw above, if you call next
on two variables which both point to the same iterator, you'll end up calling next
on the same iterator multiple times.
So zipping a generator (or any other iterator) to itself, will result in this odd and interesting behavior:
>>> numbers = [1, 2, 3, 4]
>>> squares = (n**2 for n in numbers) # this makes a generator, which is an iterator
>>> list(zip(squares, squares))
[(1, 4), (9, 16)]
Note that zipping different generators/iterators together doesn't do anything odd:
>>> numbers = [1, 2, 3, 4]
>>> squares = (n**2 for n in numbers)
>>> cubes = (n**3 for n in numbers)
>>> list(zip(squares, cubes))
[(1, 1), (4, 8), (9, 27), (16, 64)]
The weirdness comes from zip
getting multiple references to the same iterator because zip
just calls iter
on whatever you give to it and then repeatedly calls next
on the resulting iterators.
Need to fill-in gaps in your Python skills?
Sign up for my Python newsletter where I share one of my favorite Python tips every week.
Need to fill-in gaps in your Python skills? I send weekly emails designed to do just that.