Sign in to your Python Morsels account to save your screencast settings.
Don't have an account yet? Sign up here.
What are hashable objects?
hashabledefinition in Python Terminology.
The elements in a set must be hashable.
Strings and numbers are hashable:
>>> colors = {"pink", "blue", "green", "purple"}
>>> choices = {10, 20, 30, 40}
Tuples of strings and numbers are also hashable:
>>> coordinates = {(0, 1), (5, 4)}
Lists are not hashable:
>>> color_groups = {["pink", "green"], ["green", "purple"]}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
The keys in a dictionary also need to be hashable.
So strings, numbers, and tuples of strings and numbers are valid dictionary keys:
>>> color_counts = {"pink": 1, "blue": 2, "green": 4, "purple": 3}
>>> choice_counts = {10: 4, 20: 3, 30: 6, 40: 2}
>>> coordinate_values = {(0, 1): "abode", (5, 4): "food"}
But lists are not valid dictionary keys:
>>> coordinate_values = {[0, 1]: "abode", [5, 4]: "food"}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
Set and dictionary look-ups both rely on the hashability of an object to make their containment checks faster.
Here, we're looping over a list of 20,000 numbers, and we're filtering that list down (with a comprehension) to just the numbers whose two neighboring numbers are not in the same list:
>>> import random
>>> numbers = [random.randrange(1_000_000) for _ in range(20_000)]
>>> no_neighbors = [
... n
... for n in numbers
... if n-1 not in numbers
... and n+1 not in numbers
... ]
...
This code takes a while to run (it takes about 10 seconds to run on my machine).
The reason is that in
operator lookup requires looping over the whole list to figure out whether that number is in the list.
If instead, we create a new set (numbers_set
) out of that list and then run the same code again, but look up the neighboring numbers in the set this time, Python runs this code basically instantly:
>>> import random
>>> numbers = [random.randrange(1_000_000) for _ in range(20_000)]
>>> numbers_set = set(numbers)
>>> no_neighbors = [
... n for n in numbers
... if n-1 not in numbers_set
... and n+1 not in numbers_set
... ]
...
Don't believe the time difference could be that big? You can try the list code and try the set code or run this quick script to compare the two side-by-side.
When we use the in
operator on a set, Python uses some math magic to answer our question very quickly, without even looping over the set.
This math magic relies on the fact that every hashable object has a hash value.
You can see this hash value by passing an object to the built-in hash
function:
>>> name = "Trey"
>>> hash(name)
3675389749896195359
That weird number (3675389749896195359
) represents the
hash value of the string Trey
in my Python interpreter.
The hash value of an object is meant to semi-uniquely represent that object.
Sets and dictionaries use this number to figure out where to store different objects, so they can quickly find them by their hash
value.
You can think of it as like an indexing system or a cataloging system.
Aside: it's unusual to call the hash
function directly because while we do care about hashability, we don't usually need to care about the hash value itself.
Immutable objects tend to be hashable. So strings, numbers, and tuples are all hashable:
>>> hash("Python")
547265888590872204
>>> hash(4.5)
1152921504606846980
>>> hash((1, 2, 3))
529344067295497451
Mutable objects are usually not hashable. So lists are not hashable:
>>> hash([1, 2, 3])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
The hash value of an object should correspond to that object's sense of equality.
We have two tuples here:
>>> x = (2, 1, 3, 4)
>>> y = (2, 1, 3, 4)
These two tuples represent the same data, so they're equal:
>>> x == y
True
But they're also hashable, which means the hash value of one of these tuples should be the same as the hash value of the other tuple:
>>> hash(x)
-4266295778514409031
>>> hash(y)
-4266295778514409031
This means that if we put these two tuples in a set, we'll only get one of them in that set:
>>> x = (2, 1, 3, 4)
>>> y = (2, 1, 3, 4)
>>> s = {x, y}
>>> s
{(2, 1, 3, 4)}
From the perspective of hashability and equality in Python, these are basically the same tuple.
So the hash value of an object should correspond to that object's sense of equality.
But the hash value of an object should never change over the lifetime of that object.
This is the reason that most mutable objects aren't hashable. A mutable object's sense of equality can often change over the lifetime of that object, which would mean its hash value could change, which would be a big problem if it was hashable.
Hashable objects can be elements in a set or keys in a dictionary.
If the answer to the question "is object A equal to object B" can change over the lifetime of those two objects, then at least one of those two object is not hashable.
Hashable objects in Python also tend to be immutable.
Need to fill-in gaps in your Python skills?
Sign up for my Python newsletter where I share one of my favorite Python tips every week.
Need to fill-in gaps in your Python skills? I send weekly emails designed to do just that.