You've somehow ended up with lists nested inside of lists, possibly like this one:
>>> groups = [["Hong", "Ryan"], ["Anthony", "Wilhelmina"], ["Margaret", "Adrian"]]
But you want just a single list (without the nesting) like this:
>>> expected_output = ["Hong", "Ryan", "Anthony", "Wilhelmina", "Margaret", "Adrian"]
You need to flatten your list-of-lists.
We can think of this as a shallow flatten operation, meaning we're flattening this list by one level. A deep flatten operation would handle lists-of-lists-of-lists-of-lists (and so on) and that's a bit more than we need for our use case.
The flattening strategy we come up with should work on lists-of-lists as well as any other type of iterable-of-iterables. For example lists of tuples should be flattenable:
>>> groups = [("Hong", "Ryan"), ("Anthony", "Wilhelmina"), ("Margaret", "Adrian")]
And even an odd type like a dict_items
object (which we get from asking a dictionary for its items) should be flattenable:
>>> fruit_counts = {"apple": 3, "lime": 2, "watermelon": 1, "mandarin": 4}
>>> fruit_counts.items()
dict_items([('apple', 3), ('lime', 2), ('watermelon', 1), ('mandarin', 4)])
>>> flattened_counts = ['apple', 3, 'lime', 2, 'watermelon', 1, 'mandarin', 4]
for
loopsOne way to flatten an iterable-of-iterables is with a for
loop.
We can loop one level deep to get each of the inner iterables.
for group in groups:
...
And then we loop a second level deep to get each item from each inner iterable.
for group in groups:
for name in group:
...
And then append each item to a new list:
names = []
for group in groups:
for name in group:
names.append(name)
There's also a list method that makes this a bit shorter, the extend
method:
names = []
for group in groups:
names.extend(group)
The list extend
method accepts an iterable and appends every item in the iterable you give to it.
Or we could use the +=
operator to concatenate each list to our new list:
names = []
for group in groups:
names += group
You can think of +=
on lists as calling the extend
method.
With lists these two operations (+=
and extend
) are equivalent.
This nested for
loop with an append
call might look familiar:
names = []
for group in groups:
for name in group:
names.append(name)
The structure of this code looks like something we could copy-paste into a list comprehension.
Inside our square brackets we'd copy the thing we're appending first, and then the logic for our first loop, and then the logic for our second loop:
names = [
name
for group in groups
for name in group
]
This comprehension loops two levels deep, just like our nested for
loops did.
Note that the order of the for
clauses in the comprehension must remain the same as the order of the for
loops.
The (sometimes confusing) order of those for
clauses is partly why I recommend copy-pasting into a comprehension.
When turning a for
loop into a comprehension, the for
and if
clauses remain in the same relative place, but the thing you're appending moves from the end to the beginning.
*
unpacking work in a comprehension?But what about Python's *
operator?
I've written about the many uses for the prefixed asterisk symbol in Python.
We can use *
in Python's list literal syntax ([
...]
) to unpack an iterable into a new list:
>>> numbers = [3, 4, 7]
>>> more_numbers = [2, 1, *numbers, 11, 18]
>>> more_numbers
[2, 1, 3, 4, 7, 11, 18]
Could we use that *
operator to unpack an iterable within a comprehension?
names = [
*group
for group in groups
]
We can't.
If we try to do this Python will specifically tell us that the *
operator can't be used like this in a comprehension:
>>> names = [
... *group
... for group in groups
... ]
File "<stdin>", line 2
]
^
SyntaxError: iterable unpacking cannot be used in comprehension
This feature was specifically excluded from PEP 448, the Python Enhancement Proposal that added this *
-in-list-literal syntax to Python due to readability concerns.
sum
to flatten?Here's another list flattening trick I've seen a few times:
>>> names = sum(groups, [])
This does work:
>>> names
['Hong', 'Ryan', 'Anthony', 'Wilhelmina', 'Margaret', 'Adrian']
But I find this technique pretty unintuitive.
We use the +
operator in Python for both adding numbers and concatenating sequences and the sum
function happens to work with anything that supports the +
operator (thanks to duck typing).
But in my mind, the word "sum" implies arithmetic: summing adds numbers together.
I find it confusing to "sum" lists, so I don't recommend this approach.
There's another big problem though: the algorithm sum
uses also makes list flattening really slow (timing comparison here).
In Big-O terms (for the time complexity nerds), sum
with lists is O(n**2)
instead of O(n)
.
Put another way: flattening a 1,000 lists that each for 3 items takes about 3 million operations instead of about 3 thousand operations.
itertools.chain
?There is one more tool that's often used for flattening: the chain
utility in the itertools
module.
chain
accepts any number arguments and it returns an iterator:
>>> from itertools import chain
>>> chain(*groups)
<itertools.chain object at 0x7fc1b2d65bb0>
We can loop over that iterator or turn it into another iterable, like a list:
>>> list(chain(*groups))
['Hong', 'Ryan', 'Anthony', 'Wilhelmina', 'Margaret', 'Adrian']
There's actually a method on chain
that's specifically for flattening a single iterable:
>>> list(chain.from_iterable(groups))
['Hong', 'Ryan', 'Anthony', 'Wilhelmina', 'Margaret', 'Adrian']
Using chain.from_iterable
is more performant than using chain
with *
because *
unpacks the whole iterable immediately when chain
is called.
If you want to flatten an iterable-of-iterables lazily, I would use itertools.chain.from_iterable
:
>>> from itertools import chain
>>> flattened = chain.from_iterable(groups)
This will return an iterator, meaning no work will be done until the returned iterable is looped over:
>>> list(flattened)
['Hong', 'Ryan', 'Anthony', 'Wilhelmina', 'Margaret', 'Adrian']
And it will be consumed as we loop, so looping twice will result in an empty iterable:
>>> list(flattened)
[]
If you find itertools.chain
a bit too cryptic, you might prefer a for
loop that calls the extend
method on a new list to repeatedly extend the values in each iterable:
names = []
for group in groups:
names.extend(group)
Or a for
loop that uses the +=
operator on our new list:
names = []
for group in groups:
names += group
Unlike chain.from_iterable
, both of these for
loops build up new list rather than a lazy iterator object.
If you find list comprehensions readable (I love them for signaling "look we're building up a list") then you might prefer a comprehension instead:
names = [
name
for group in groups
for name in group
]
And if you do want laziness (an iterator) but you don't like itertools.chain
you could make a generator expression that does the same thing as itertools.chain.from_iterable
:
names = (
name
for group in groups
for name in group
)
Happy list flattening!
Intro to Python courses often skip over some fundamental Python concepts.
Sign up below and I'll explain concepts that new Python programmers often overlook.
Intro to Python courses often skip over some fundamental Python concepts.
Sign up below and I'll share ideas new Pythonistas often overlook.