Python's Counter
class is one of the most useful data structures that's also frequently overlooked.
Counter
objects are mappings (dictionary-like objects) that are specially built just for counting up occurrences of items.
I'd like to share how I typically use Counter
objects in Python.
Counter
?Python's collections.Counter
objects are similar to dictionaries but they have a few extra features that can simplify item tallying.
>>> from collections import Counter
>>> colors = ["purple", "pink", "green", "yellow", "yellow", "purple", "purple", "black"]
>>> color_counts = Counter(colors)
>>> color_counts
Counter({'purple': 3, 'yellow': 2, 'pink': 1, 'green': 1, 'black': 1})
>>> color_counts['purple']
3
Counter
objectThere are two ways you'll usually see a Counter
object made: using a for
loop or by passing in an iterable.
Here's an example of using a for
loop to increment keys within a Counter
object:
from collections import Counter
words = Counter()
for word in text.split():
words[word.strip(".,!?\"'")] += 1
Note that this is similar to a dictionary, except that when a key doesn't exist within a Counter
that key's value will default to 0
.
Here's an example of passing an iterable to Counter
:
from collections import Counter
words = Counter(w.strip(".,!?\"'") for w in text.split())
Note that we're passing a generator expression to the Counter
class here.
It's pretty common to see a generator expression passed to Counter
if the items you're counting need a bit of normalizing or altering before they're counted up (we're stripping punctuation in our case).
Of these two ways to use Counter
, passing an iterable directly into Counter
is simpler and usually preferable to using a for
loop.
Let's look at some of the most useful operations that Counter
objects support.
The feature I use Counter
for most often is the most_common
method.
The most_common
method is like the dictionary items
method but sorts the items by their values (their counts) in descending order.
>>> color_counts.items()
dict_items([('purple', 3), ('pink', 1), ('green', 1), ('yellow', 2), ('black', 1)])
>>> color_counts.most_common()
[('purple', 3), ('yellow', 2), ('pink', 1), ('green', 1), ('black', 1)]
Unlike the items
method, most_common
also accepts a number to indicate how many of the most common items you'd like (it returns all items by default).
>>> color_counts.most_common(2)
[('purple', 3), ('yellow', 2)]
Keep in mind that if there's a "tie" for the most common n
-th item, the tie will be arbitrarily broken.
For example, here there are two items that tie for "most common item" but most_common(1)
just returns one of them:
>>> color_counts["yellow"] += 1
>>> color_counts.most_common(2)
[('purple', 3), ('yellow', 3)]
>>> color_counts.most_common(1)
[('purple', 3)]
Here we're asking for the 5 most frequently seen characters in a string:
>>> from collections import Counter
>>> message = "Python is pretty nifty!"
>>> Counter(message.casefold()).most_common(5)
[('t', 4), ('y', 3), (' ', 3), ('p', 2), ('n', 2)]
Or the most common word in a string (assuming there's no punctuation):
>>> lyric = "don't worry about it just do what you do and do it good"
>>> Counter(lyric.split()).most_common(1)
[('do', 3)]
Or, using a regular expression, we could get all words that appear more than once displayed in descending order of commonality, with punctuation removed:
>>> from collections import Counter
>>> import re
>>> bridge = """
... If you read the album cover by now
... You know that my name is what my name is
... When I came in here to try and
... Do this, something I've never done before
... Mr. Jones, Booker T., said to me
... Don't worry about it
... Just do what you do
... And do it good
... """
>>> words = re.findall(r"[A-Za-z']+", bridge)
>>> for word, count in Counter(words).most_common():
... if count <= 1:
... break
... print(word)
...
do
you
my
name
is
what
to
it
Counter
Like dictionaries, Counter
objects have an update
method:
>>> letters = Counter("hi")
>>> letters.update({"a": 1, "b": 1, "c": 2})
>>> letters
Counter({'c': 2, 'h': 1, 'i': 1, 'a': 1, 'b': 1})
But unlike dictionaries, the update
method on Counter
objects is usually used to count additional items:
>>> letters = Counter("hi")
>>> letters.update("hiya")
>>> letters
Counter({'h': 2, 'i': 2, 'y': 1, 'a': 1})
You can pass an iterable to update
and the Counter
object will loop over it and increase the counts of those items.
Counter
Counter
objects also have a subtract
method:
>>> colors = Counter()
>>> colors.subtract(["red", "green", "blue", "green", "blue", "green"])
>>> colors
Counter({'red': -1, 'blue': -2, 'green': -3})
If we only ever subtract items from our Counter
, the most_common
method would instead return the least common items (since our counts are all negative):
>>> colors.most_common(1)
[('red', -1)]
It's rare than I use negatives in counters, but they can occasionally be handy.
Negatives with Counter
can be finicky when combined with arithmetic though, so use them with caution.
Otherwise your zero and negative values may disappear if you're not careful:
>>> colors
Counter({'red': -1, 'blue': -2, 'green': -3})
>>> colors + Counter({'red': 2, 'green': 1})
Counter({'red': 1})
What if you want to discard all negatives and zero counts from your Counter
object?
You can use the unary +
operator to remove every item that doesn't have a positive count:
>>> from collections import Counter
>>> letters = Counter('aaabbc')
>>> letters.subtract('abbcc')
>>> letters
Counter({'a': 2, 'b': 0, 'c': -1})
>>> letters = +letters
>>> letters
Counter({'a': 2})
Counter
objectsYou can even add Counter
objects together:
>>> fruit_counts = Counter(["pear", "kiwi", "pear", "lime"])
>>> more_fruit_counts = Counter(["pear", "lime"])
>>> fruit_counts + more_fruit_counts
Counter({'pear': 3, 'lime': 2, 'kiwi': 1})
And you can subtract them:
>>> fruit_counts - more_fruit_counts
Counter({'pear': 1, 'kiwi': 1})
Note that once a value becomes 0
or negative, it'll be removed from the Counter
object.
Counter
comprehensionsBy far my most common use for Counter
is passing in a generator expression to count up a specific aspect of each iterable item.
For example, how many users in a list of users have each subscription type:
Counter(
user.subscription_type
for user in users
)
Or, counting up each word in string, while ignoring surrounding punctuation marks:
words = Counter(w.strip(".,!?\"'") for w in text.split())
Those are actually generators passed into the Counter
class, but they're like comprehensions: they use a comprehension-like syntax to create a new object (a Counter
object).
Counter
for counting occurrences of many itemsThe next time you need to count how many times a particular item occurs, consider using collections.Counter
.
Need to fill-in gaps in your Python skills?
Sign up for my Python newsletter where I share one of my favorite Python tips every week.
Need to fill-in gaps in your Python skills? I send weekly emails designed to do just that.