How to de-duplicate a list in Python

Trey Hunner smiling in a t-shirt against a yellow wall
Trey Hunner
3 min. read Python 3.8—3.12
Copied to clipboard.

Need to de-duplicate a list of items?

>>> all_colors = ["blue", "purple", "green", "red", "green", "pink", "blue"]

How can you do this in Python?

Let's take a look at two approach for de-duplicating: one when we don't care about the order of our items and one when we do.

Using a set to de-duplicate

You can use the built-in set constructor to de-duplicate the items in a list (or in any iterable):

>>> unique_colors = set(all_colors)
>>> unique_colors
{'blue', 'pink', 'green', 'purple', 'red'}

This only works for lists of hashable values, but that includes quite a few values: strings, numbers, and most tuples are hashable in Python.

You might have noticed that the order of the original items was lost once they were converted to a set:

>>> all_colors
['blue', 'purple', 'green', 'red', 'green', 'pink', 'blue']
>>> unique_colors
{'blue', 'pink', 'green', 'purple', 'red'}

Even if we convert the items back to a list, that original order won't be maintained:

>>> unique_colors = list(set(all_colors))
>>> unique_colors
['blue', 'pink', 'green', 'purple', 'red']

What if we want to maintain the order of our items while de-duplicating?

De-duplicating without losing order

To de-duplicate while maintaining relative item order, we can use dict.fromkeys:

>>> unique_colors = dict.fromkeys(all_colors)
>>> >>> unique_colors
{'blue': None, 'purple': None, 'green': None, 'red': None, 'pink': None}

Python's dict class has a fromkeys class method which accepts an iterable and makes a new dictionary where the keys are the items from the given iterable.

Since dictionaries can't have duplicate keys, this also de-duplicates the given items! Dictionaries also maintain the order of their items (as of Python 3.6), so the resulting dictionary will have its keys ordered based on the first time each value was seen.

Okay, we have a dictionary now, but how can we use it?

Well, dictionaries have a keys method which we could use to get an iterable of just the keys:

>>> unique_colors = dict.fromkeys(all_colors).keys()
>>> unique_colors
dict_keys(['blue', 'purple', 'green', 'red', 'pink'])

And we could even convert those keys to a list:

>>> unique_colors = list(dict.fromkeys(all_colors).keys())
>>> unique_colors
['blue', 'purple', 'green', 'red', 'pink']

But dictionaries are also iterables (looping over a dictionary provides the keys), so we could simply pass the dictionary to the built-in list constructor:

>>> unique_colors = list(dict.fromkeys(all_colors))
>>> unique_colors
['blue', 'purple', 'green', 'red', 'pink']

That might look a bit odd, but it works.

If you prefer to be more explicit by calling the keys method, you're welcome to. I don't have a strong preference between these two approaches: being explicit is nice but so is brevity.

One last thing to note: if you just need to loop over the unique items right away there's no need to convert back to a list. This works fine:

>>> for color in dict.fromkeys(all_colors):
...     print(color)

That works because all forms of iteration are the same in Python: whether you're using the list constructor, a for loop, or a list comprehension it all works the same way.

Avoid using lists to de-duplicate

You might be wondering whether a list and a for loop would work well for de-duplicating.

>>> unique_colors = []
>>> for color in all_colors:
...     if color not in unique_colors:
...         unique_colors.append(color)
>>> unique_colors
['blue', 'purple', 'green', 'red', 'pink']

This does work, but if you have many values to de-duplicate this could be very slow because the in operator on lists is considerably slower than the in operator on sets. Watch List containment checks for more on that.

Use sets and dictionaries for de-duplicating

The next time you need to de-duplicate items in your list (or in any iterable), try out Python's set constructor.

>>> unique_items = set(original_items)

If you need to de-duplicate while maintaining the order of your items, use dict.fromkeys instead:

>>> unique_items = list(dict.fromkeys(original_items))

That will de-duplicate your items while keeping them in the order that each item was first seen.

If you'd like practice de-duplicating list items, try out the uniques_only Python Morsels exercise. The bonuses include some twists that weren't discussed above. 😉

A Python Tip Every Week

Need to fill-in gaps in your Python skills? I send weekly emails designed to do just that.