How to Identify Most Frequently Occurring Items in a Sequence with Python?


Problem

You need to identify the most frequently occurring items in a sequence.

Solution

We can use counter to keep track of the items in a sequence.

What is a Counter ?

The “Counter” is a mapping that holds an integer count for each key. Updating an existing key adds to its count. This Objectis used for counting the instances of hashable objects or as a multiset.

The “Counter” is one of your best friends when you are performing data analysis.

This object has been present in Python for quite some time, and so for a lot of you, this will be a quick review.We will start byimporting Counter from collections.

from collections import Counter

A traditional dictionary, if it has a missing key, will raise a key error. Python’s dictionaries will answer with a key error if thekey is not found.

# An empty dictionary
dict = {}

# check for a key in an empty dict
dict['mystring']

# Error message
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-12-1e03564507c6> in <module>
3
4 # check for a key in an empty dict
----> 5 dict['mystring']
6
7 # Error message
KeyError: 'mystring'

How can we avoid key error exceptions in this situation ?

The Counter is a sub-class of dictionary and has very dictionary-like behavior, however, if you look up a missing key ratherthan raising a key error, it just returns zero.

# define the counter
c = Counter()


# check for the unavailable key
print(f"Output\n{c['mystring']}")

Output

0
c['mystring'] += 1
print(f"Output\n{c}")

Output

Counter({'mystring': 1})

Example

print(f"Output\n{type(c)}")

Output

<class 'collections.Counter'>

Most Frequently Occurring Items in a Sequence

One of the other nice things about the counter is that you can a list of objects and it will count them for you. It saves us frombuilding a loop in order to construct our counter.

Counter
('Peas porridge hot peas porridge cold peas porridge in the pot nine days old'.split())

Output

Counter({'Peas': 1,
'porridge': 3,
'hot': 1,
'peas': 2,
'cold': 1,
'in': 1,
'the': 1,
'pot': 1,
'nine': 1,
'days': 1,
'old': 1})

What the split will do is take the string and split it into a list of words. It splits on white space.

The “Counter” will loop over that list and count all of the words, giving us the counts shown in the output.

There is more, I can also count the most common words in the phrase.

most_common() method will give us the frequently occurring items.

count = Counter('Peas porridge hot peas porridge cold peas porridge in the pot nine days old'.split())
print(f"Output\n{count.most_common(1)}")

Output

[('porridge', 3)]

Example

print(f"Output\n{count.most_common(2)}")

Output

[('porridge', 3), ('peas', 2)]

Example

print(f"Output\n{count.most_common(3)}")

Output

[('porridge', 3), ('peas', 2), ('Peas', 1)]

Notice that it returned a list of tuples. The first part of the tuple is the word, and the second part is its count.

A little-known feature of Counter instances is that they can be easily combined using various mathematical operations.

string = 'Peas porridge hot peas porridge cold peas porridge in the pot nine days old'
another_string =
'Peas peas hot peas peas peas cold peas'

a = Counter(string.split())
b = Counter(another_string.split())


# Add counts
add = a + b
print(f"Output\n{add}")

Output

Counter({'peas': 7, 'porridge': 3, 'Peas': 2, 'hot': 2, 'cold': 2, 'in': 1, 'the': 1, 'pot': 1, 'nine': 1, 'days': 1, 'old': 1})
# Subtract counts
sub = a - b
print(f"Output\n{sub}")

Output

Counter({'porridge': 3, 'in': 1, 'the': 1, 'pot': 1, 'nine': 1, 'days': 1, 'old': 1})

Finally, Counter is very smart in how it stores the data in a container.

As you can see above it groups the words together when storing allowing us to take them together which is commonlyreferred to as a multiset.

We can pull the words one at a time using elements. It does not remember the order but puts all the words in a phrasetogether.

Example

print(f"Output\n{list(a.elements())}")

Output

['Peas', 'porridge', 'porridge', 'porridge', 'hot', 'peas', 'peas', 'cold', 'in', 'the', 'pot', 'nine', 'days', 'old']

Example

print(f"Output\n{list(a.values())}")

Output

[1, 3, 1, 2, 1, 1, 1, 1, 1, 1, 1]

Example

print(f"Output\n{list(a.items())}")

Output

[('Peas', 1), ('porridge', 3), ('hot', 1), ('peas', 2), ('cold', 1), ('in', 1), ('the', 1), ('pot', 1), ('nine', 1), ('days', 1), ('old', 1)]

Updated on: 10-Nov-2020

197 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements