Count list member pairs within array


Let's assume, I've got an array containing the following lists:

data = [['a', 'b', 'c'],['a', 'b'],['c']]

What would be the best solution to count every pair occurrence by the number of lists they're in?

E.g. result should be:

member_one_is  member_two_is  COUNT
a              b              2
a              c              1
b              c              1

CodePudding user response:

One approach using collections.Counter and itertools.combinations:

from collections import Counter
from itertools import combinations

import pandas as pd

data = [['a', 'b', 'c'], ['a', 'b'], ['c']]

# get the counts using collections Counter and the combinations using combinations
# make sure each sub-list is sorted with sorted
counts = Counter(combination for lst in map(sorted, data) for combination in combinations(lst, 2))

# create the DataFrame
df = pd.DataFrame(data=[[*k, v] for k, v in counts.items()], columns=["member_one_is", "member_two_is", "COUNT"])


  member_one_is member_two_is  COUNT
0             a             b      2
1             a             c      1
2             b             c      1

Note that if the list are sorted you can skip the map(sorted, data) and iterate directly over data.

