Let's assume, I've got an array containing the following lists:
data = [['a', 'b', 'c'],['a', 'b'],['c']]
What would be the best solution to count every pair occurrence by the number of lists they're in?
E.g. result should be:
member_one_is member_two_is COUNT
a b 2
a c 1
b c 1
CodePudding user response:
One approach using collections.Counter
and itertools.combinations
:
from collections import Counter
from itertools import combinations
import pandas as pd
data = [['a', 'b', 'c'], ['a', 'b'], ['c']]
# get the counts using collections Counter and the combinations using combinations
# make sure each sub-list is sorted with sorted
counts = Counter(combination for lst in map(sorted, data) for combination in combinations(lst, 2))
# create the DataFrame
df = pd.DataFrame(data=[[*k, v] for k, v in counts.items()], columns=["member_one_is", "member_two_is", "COUNT"])
print(df)
Output
member_one_is member_two_is COUNT
0 a b 2
1 a c 1
2 b c 1
Note that if the list are sorted you can skip the map(sorted, data)
and iterate directly over data
.