I have a dictionary with keys of itemsets and values of their counts. I want to count how many times the itemsets appear in a dataframe (as an exact match). The dataframe has ~10k rows
Dictionary of 1st itemsets (dict_of_items):
{'apple','banana','pear'}: 0,
{'banana', 'orange', 'squash'}: 0
Dataframe of 2nd itemsets (df):
Index | basket
1 | ['apple','banana',pear']
2 | ['banana']
3 | ['banana', 'orange','squash']
4 | ['apple','banana',pear']
...
Desired output (where the dictionary's values is the actual count):
{'apple','banana','pear'}: 2,
{'banana', 'orange', 'squash'}: 1
I have tried in and .iterrows(), but the values remain 0, e.g.:
for item in dict_of_items:
if item in df['basket']:
dict_of_item[item] = 1
CodePudding user response:
Issues with posted solution:
- Dictionary can not contain sets as keys since sets are not hashable (use frozenset)
if item in df['basket']:
doesn't work since basket contain lists and item is a set.
Code
import pandas as pd
from collections import Counter
# Initialization
dict_of_item = {
frozenset({'apple','banana','pear'}): 0,
frozenset({'banana', 'orange', 'squash'}): 0}
data = {'basket': [['apple','banana', 'pear'],
['banana'],
['banana', 'orange','squash'],
['apple','banana', 'pear']]}
df = pd.DataFrame(data)
# Processing
# Get count of sets in basket by convert each list to a frozen set and counting each frozen set appears in column basket.
basket_set_count = Counter(df['basket'].apply(frozenset))
# Find intersection of keys in basket_set_count and dictionary of keys
# Use the count from basket_set_count as the number of elements
result = {k:basket_set_count[k] for k in set(basket_set_count.keys()) & set(dict_of_item.keys())}
print(result)
# Output: {frozenset({'pear', 'banana', 'apple'}): 2,
frozenset({'orange', 'squash', 'banana'}): 1}