What are hashable objects?

Trey Hunner smiling in a t-shirt against a yellow wall
Trey Hunner
4 minute read Python 3.7—3.10
Share
Copied to clipboard.

What are hashable objects?

Sets items must be hashable

The elements in a set must be hashable.

Strings and numbers are hashable:

>>> colors = {"pink", "blue", "green", "purple"}
>>> choices = {10, 20, 30, 40}

Tuples of strings and numbers are also hashable:

>>> coordinates = {(0, 1), (5, 4)}

Lists are not hashable:

>>> color_groups = {["pink", "green"], ["green", "purple"]}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Dictionary keys must be hashable

The keys in a dictionary also need to be hashable.

So strings, numbers, and tuples of strings and numbers are valid dictionary keys:

>>> color_counts = {"pink": 1, "blue": 2, "green": 4, "purple": 3}
>>> choice_counts = {10: 4, 20: 3, 30: 6, 40: 2}
>>> coordinate_values = {(0, 1): "abode", (5, 4): "food"}

But lists are not valid dictionary keys:

>>> coordinate_values = {[0, 1]: "abode", [5, 4]: "food"}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Hashability makes dictionaries and sets fast

Set and dictionary look-ups both rely on the hashability of an object to make their containment checks faster.

Here, we're looping over a list of 20,000 numbers, and we're filtering that list down (with a comprehension) to just the numbers whose two neighboring numbers are not in the same list:

>>> import random
>>> numbers = [random.randrange(1_000_000) for _ in range(20_000)]
>>> no_neighbors = [
...     n
...     for n in numbers
...     if n-1 not in numbers
...     and n+1 not in numbers
... ]
...

This code takes a while to run (it takes about 10 seconds to run on my machine).

The reason is that in operator lookup requires looping over the whole list to figure out whether that number is in the list.

If instead, we create a new set (numbers_set) out of that list and then run the same code again, but look up the neighboring numbers in the set this time, Python runs this code basically instantly:

>>> import random
>>> numbers = [random.randrange(1_000_000) for _ in range(20_000)]
>>> numbers_set = set(numbers)
>>> no_neighbors = [
...     n for n in numbers
...     if n-1 not in numbers_set
...     and n+1 not in numbers_set
... ]
...

Don't believe the time difference could be that big? You can try the list code and try the set code or run this quick script to compare the two side-by-side.

Every hashable object has a semi-unique hash value

When we use the in operator on a set, Python uses some math magic to answer our question very quickly, without even looping over the set.

This math magic relies on the fact that every hashable object has a hash value. You can see this hash value by passing an object to the built-in hash function:

>>> name = "Trey"
>>> hash(name)
3675389749896195359

That weird number (3675389749896195359) represents the hash value of the string Trey in my Python interpreter.

The hash value of an object is meant to semi-uniquely represent that object.

Sets and dictionaries use this number to figure out where to store different objects, so they can quickly find them by their hash value. You can think of it as like an indexing system or a cataloging system.

Aside: it's unusual to call the hash function directly because while we do care about hashability, we don't usually need to care about the hash value itself.

Hashable objects are often immutable

Immutable objects tend to be hashable. So strings, numbers, and tuples are all hashable:

>>> hash("Python")
547265888590872204
>>> hash(4.5)
1152921504606846980
>>> hash((1, 2, 3))
529344067295497451

Mutable objects are usually not hashable. So lists are not hashable:

>>> hash([1, 2, 3])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Hashability is linked to equality

The hash value of an object should correspond to that object's sense of equality.

We have two tuples here:

>>> x = (2, 1, 3, 4)
>>> y = (2, 1, 3, 4)

These two tuples represent the same data, so they're equal:

>>> x == y
True

But they're also hashable, which means the hash value of one of these tuples should be the same as the hash value of the other tuple:

>>> hash(x)
-4266295778514409031
>>> hash(y)
-4266295778514409031

This means that if we put these two tuples in a set, we'll only get one of them in that set:

>>> x = (2, 1, 3, 4)
>>> y = (2, 1, 3, 4)
>>> s = {x, y}
>>> s
{(2, 1, 3, 4)}

From the perspective of hashability and equality in Python, these are basically the same tuple.

An object's hash value should never change

So the hash value of an object should correspond to that object's sense of equality.

But the hash value of an object should never change over the lifetime of that object.

This is the reason that most mutable objects aren't hashable. A mutable object's sense of equality can often change over the lifetime of that object, which would mean its hash value could change, which would be a big problem if it was hashable.

Summary

Hashable objects can be elements in a set or keys in a dictionary.

If the answer to the question "is object A equal to object B" can change over the lifetime of those two objects, then at least one of those two object is not hashable.

Hashable objects in Python also tend to be immutable.

A Python Tip Every Week

Need to fill-in gaps in your Python skills? I send weekly emails designed to do just that.