Home > Back-end >  Finding the most common element in a list of lists
Finding the most common element in a list of lists

Time:01-03

I am given a list of lists like this:

pairs = [[(0, 0), (0, 1), (0, 2), (1, 2), (2, 2), (3, 2), (3, 1), (2, 1), (3, 1), (3, 2), (3, 3), (3, 2), (2, 2)], 
         [(2, 2), (2, 1)], 
         [(1, 1), (1, 2), (2, 2), (2, 1)]]

and the desired output is: {(2,2)}.

I need to find the most frequent element(s). It has to return more than one value if there are elements that are repeated just as many times.

I tried solving it with an intersection of three lists, but it prints out {(2,1), (2,2)}, instead of {(2,2)}, since the element (2,2) is repeated two times in the first list.

I saw a few examples with import collections, but I don't understand them so I don't know how to change the code to be suitable for my problem.

I also tried the following:

seen = set()
repeated = set()
for l in pairs:
    for i in set(l):
        if i in seen:
            repeated.add(i)
        if i in repeated:
            repeated.add(i)
        else:
            seen.add(i)

but still doesn't return the correct answer.

CodePudding user response:

As hinted in comments, defaultdict is super helpful.

from collections import defaultdict

We have `pairs', but since you want the most frequent across all lists, we'll flatten it.

pairs = [[(0, 0), (0, 1), (0, 2), (1, 2), (2, 2), (3, 2), (3, 1), (2, 1), (3, 1), (3, 2), (3, 3), (3, 2), (2, 2)], 
         [(2, 2), (2, 1)], 
         [(1, 1), (1, 2), (2, 2), (2, 1)]]

flat_pairs = [tpl for lst in pairs for tpl in lst]

And now we'll create a defaultdict that defaults to zero to hold our counts, and iterate over flat_pairs to add counts to the dict.

counts = defaultdict(lambda: 0)

for tpl in flat_pairs: 
    counts[tpl]  = 1

Or we could skip the flattening stage:

counts = defaultdict(lambda: 0)

for lst in pairs:
    for tpl in lst: 
        counts[tpl]  = 1

We can find the max count value by using max on the values of the dict, and then a list comprehension to get the tuple or tuples with the max count.

max_count = max(counts.values())

max_count_tuples = [tpl for tpl, count in counts.items() if count == max_count]

CodePudding user response:

Alternative solution without using the Counter method from collections:

def get_freq_tuple(data):
    counts = {}
    for pairs in data:
        for pair in pairs:
            counts[pair] = counts.get(pair, 0)   1

    return [pair for pair in counts if counts[pair] == max(counts.values())]

if __name__ == "__main__":
    pairs = [[(0, 0), (0, 1), (0, 2), (1, 2), (2, 2), (3, 2), (3, 1), (2, 1),
              (3, 1), (3, 2), (3, 3), (3, 2), (2, 2)],
             [(2, 2), (2, 1)],
             [(1, 1), (1, 2), (2, 2), (2, 1)]]
    print(get_freq_tuple(pairs))

Output:

[(2, 2)]

Explanation:

  • Count the occurrences of each tuple and store them in a dictionary. The key of the dictionary is the tuple and the value is the occurrence.
  • Filter the tuples in the dictionary by maximum occurrences of the tuples.

Disclaimer:

  • Using Counter method from collections is much more efficient.

References:

CodePudding user response:

collections.Counter() will work...you just need to figure out how to pass all the pairs in the nested list, which you can do with a list comprehension. For example:

from collections import Counter

pairs = [[(0, 0), (0, 1), (0, 2), (1, 2), (2, 2), (3, 2), (3, 1), (2, 1), (3, 1), (3, 2), (3, 3), (3, 2), (2, 2)], 
         [(2, 2), (2, 1)], 
         [(1, 1), (1, 2), (2, 2), (2, 1)]]

counts = Counter(pair for l in pairs for pair in l)
counts.most_common(1)
# [((2, 2), 4)]

If you have a tie, you will need to look through the top choices and pick off the ones that have the same count. You can get the sorted list by looking at counts.most_common().

itertools.groupby is a common way to deal with this. For example, if you had a tie, you could get all the top entries like:

from collections import Counter
from itertools import groupby

pairs = [[(0, 0), (0, 1), (0, 2), (1, 2), (2, 2), (3, 2), (3, 1), (2, 1), (3, 1), (3, 2), (3, 3), (3, 2), (2, 2)], 
         [(2, 2), (2, 1)], 
         [(1, 1), (1, 2), (2, 2), (2, 1), (3, 2)]]

counts = Counter(pair for l in pairs for pair in l)

count, groups = next(groupby(counts.most_common(), key=lambda t: t[1]))
[g[0] for g in groups]
# [(2, 2), (3, 2)]
  • Related