Home > Software engineering >  Counting number of times dictionary keys appear in a dataframe
Counting number of times dictionary keys appear in a dataframe

Time:12-06

I have a dictionary with keys of itemsets and values of their counts. I want to count how many times the itemsets appear in a dataframe (as an exact match). The dataframe has ~10k rows

Dictionary of 1st itemsets (dict_of_items):

{'apple','banana','pear'}: 0, 
{'banana', 'orange', 'squash'}: 0

Dataframe of 2nd itemsets (df):

Index | basket
1     | ['apple','banana',pear']
2     | ['banana']
3     | ['banana', 'orange','squash']
4     | ['apple','banana',pear']
...

Desired output (where the dictionary's values is the actual count):

{'apple','banana','pear'}: 2, 
{'banana', 'orange', 'squash'}: 1

I have tried in and .iterrows(), but the values remain 0, e.g.:

for item in dict_of_items:
    if item in df['basket']:
        dict_of_item[item]  = 1

CodePudding user response:

Issues with posted solution:

  1. Dictionary can not contain sets as keys since sets are not hashable (use frozenset)
  2. if item in df['basket']: doesn't work since basket contain lists and item is a set.

Code

import pandas as pd
from collections import Counter

# Initialization
dict_of_item = {
    frozenset({'apple','banana','pear'}): 0, 
    frozenset({'banana', 'orange', 'squash'}): 0}

data = {'basket': [['apple','banana', 'pear'],
                   ['banana'],
                    ['banana', 'orange','squash'],
                    ['apple','banana', 'pear']]}
                     
df = pd.DataFrame(data)

# Processing
# Get count of sets in basket by convert each list to a frozen set and counting each frozen set appears in column basket.
basket_set_count = Counter(df['basket'].apply(frozenset))

# Find intersection of keys in basket_set_count and dictionary of keys
# Use the count from basket_set_count as the number of elements
result = {k:basket_set_count[k] for k in set(basket_set_count.keys()) & set(dict_of_item.keys())}

print(result)
# Output: {frozenset({'pear', 'banana', 'apple'}): 2, 
           frozenset({'orange', 'squash', 'banana'}): 1}
  • Related