Zipping an Iterator to Itself

This bit of Python code often confuses newer Python programmers:

>>> numbers = [1, 2, 3, 4]
>>> squares = [n**2 for n in numbers]
>>> cubes = (n**3 for n in numbers)
>>> list(zip(squares, squares))
[(1, 1), (4, 4), (9, 9), (16, 16)]
>>> list(zip(cubes, cubes))
[(1, 8), (27, 64)]

Normally the zip function loops over multiple iterables at once. Passing the same iterable multiple times to zip should reveal tuples of the same elements repeated multiple times. But passing a generator to zip multiple times does something different!

What's zip?

Python's zip function and the zip_longest function in the itertools module both loop over multiple iterables at once.

>>> numbers = [1, 2, 3, 4]
>>> letters = ['a', 'b', 'c', 'd']
>>> list(zip(numbers, letters))
[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]

The zip function is useful if you want to loop over 2 or more iterables at the same time. It's a common alternative to looping with indexes in Python.

How zip works

The zip function works by getting an iterator from each of the given iterables and repeatedly calling next on that iterator.

>>> numbers = [1, 2, 3, 4]
>>> letters = ['a', 'b', 'c', 'd']
>>> i1 = iter(numbers)
>>> i2 = iter(letters)
>>> next(i1)
1
>>> next(i2)
'a'
>>> next(i1), next(i2)
(2, 'b')

If you've never seen iter and next before, see the iterators page.

Getting iterators from iterators

Generators are iterators.

When you ask an iterator for an iterator (by passing it to the built-in iter function), it'll return itself:

>>> numbers = [1, 2, 3, 4]
>>> squares = (n**2 for n in numbers)  # this makes a generator, which is an iterator
>>> squares
<generator object <genexpr> at 0x7f1954adffc0>
>>> iter(squares)
<generator object <genexpr> at 0x7f1954adffc0>
>>> iter(squares) is squares
True

if you manually call iter on an iterator twice to get two iterators from it, you'll actually just get two references to original iterator (note that all variables are references because variables in Python are all actually pointers to objects).

So calling next on i1, i2, or squares in the below code will all do the same thing: all of these variables point to the same iterator.

>>> numbers = [1, 2, 3, 4]
>>> squares = (n**2 for n in numbers)  # this makes a generator, which is an iterator
>>> i1 = iter(squares)
>>> i2 = iter(squares)
>>> i1 is i2 is squares
True
>>> next(i1)
1
>>> next(i2)
4
>>> next(i1)
9
>>> next(i2)
16

Getting iterators from non-iterator iterables

If you call iter multiple times on a non-iterator iterable, you'll get back multiple independent iterators:

>>> numbers = [1, 2, 3, 4]
>>> i1 = iter(numbers)
>>> i2 = iter(numbers)
>>> i1 is i2
False
>>> next(i1)
1
>>> next(i2)
1
>>> next(i1)
2
>>> next(i2)
2

Calling next on i1 doesn't change the state of i2 because these variables point to two different iterators (both of which are iterating over the same numbers list).

Zipping the same generator twice

When two independent iterables are passed to zip, you'll get values from each of those independent iterables zipped together into tuples:

>>> numbers = [1, 2, 3, 4]
>>> letters = ['a', 'b', 'c', 'd']
>>> list(zip(numbers, letters))
[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]

When the same (non-iterator) iterable is passed to zip multiple times, zip will get two independent iterators from it (by calling iter on it twice) so you'll get the same value repeated multiple times in the resulting iterable:

>>> numbers = [1, 2, 3, 4]
>>> list(zip(numbers, numbers))
[(1, 1), (2, 2), (3, 3), (4, 4)]

The weirdness happens when the same iterator is passed to zip multiple times. If the zip function gets multiple references to the same iterator, it'll continue to blindly pass each argument to the built-in iter function which will result in multiple references to the same iterator.

As we saw above, if you call next on two variables which both point to the same iterator, you'll end up calling next on the same iterator multiple times.

So zipping a generator (or any other iterator) to itself, will result in this odd and interesting behavior:

>>> numbers = [1, 2, 3, 4]
>>> squares = (n**2 for n in numbers)  # this makes a generator, which is an iterator
>>> list(zip(squares, squares))
[(1, 4), (9, 16)]

Note that zipping different generators/iterators together doesn't do anything odd:

>>> numbers = [1, 2, 3, 4]
>>> squares = (n**2 for n in numbers)
>>> cubes = (n**3 for n in numbers)
>>> list(zip(squares, cubes))
[(1, 1), (4, 8), (9, 27), (16, 64)]

The weirdness comes from zip getting multiple references to the same iterator because zip just calls iter on whatever you give to it and then repeatedly calls next on the resulting iterators.