Home > Enterprise >  Find the longest most common items in multiple lists (not substring)
Find the longest most common items in multiple lists (not substring)

Time:02-18

Let's say we have a list of N lists. For example:

L = [['A','B','C','D','E'], ['A','B','C'],['B','C','D'],['C','D'],['A','C','D']]

I want to find the longest common subsets that occur in this list and the corresponding counts. In this case:

ans = {'A,B,C':2, 'A,C,D':2, 'B,C,D':2}

I think this question is similar to mine, but I am having a hard time understanding the C# code.

CodePudding user response:

I assume that a "common subset" is a set that is a subset of at least two lists in the array.

With that in mind, here's one solution.

from itertools import combinations
from collections import Counter
L = [['A','B','C','D','E'], ['A','B','C'],['B','C','D'],['C','D'],['A','C','D']]

L = [*map(frozenset,L)]
sets = [l1&l2 for l1,l2 in combinations(L,2)]
maxlen = max(len(s) for s in sets)
sets = [s for s in sets if len(s) == maxlen]
count = Counter(s for s in sets for l in L if s <= l)
dic = {','.join(s):k for s,k in count.items()}

Resulting dictionary dic:

{'A,B,C': 2, 'B,C,D': 2, 'A,C,D': 2}
  • Related