I have a dictionary, created from two columns in a dataframe using:
df_dict=df.groupby('column1')['column2'].agg(list).to_dict()
I then used Counter to count how many items were put into each key as values:
df_dict_counts = Counter(df_dict)
This gives me per key, how many items are present as the key's values, which is great.
But now, I want to count the frequency of the items per key and to print the count.
So my dictionary looks like this:
df_dict =
{
'Apples': ['big', 'medium', 'medium', 'medium' 'big','small'],
'Oranges': ['big', 'medium', 'big'],
'Bananas':['small', ''small', 'small','small', 'big'],
'Pineapples':['small', 'big', 'big','big']
}
and the output I am aiming for is something like this:
df_dict_counts =
{
'Apples': {'big':2, 'medium':3, 'small':1},
'Oranges': {'big':2, 'medium':1},
'Bananas': {'small':4, 'big':1},
'Pineapples': {'small':1, 'big':3}
}
if you could help me to then print the 'df_dict_counts' into a .csv file, it would be great!
Thanks!!
CodePudding user response:
You can use the Conter function to convert the list to the frequency map
import collections
final = {}
for k,v in df_dict.items():
final[k] = dict(collections.Counter(v))
print(final)
CodePudding user response:
You used the correct collection and counter logic.
But instead of using Counter on the df_dict
, you need to use it on the values of keys in df_dict
Try this :
import collections
df_dict =
{
'Apples': ['big', 'medium', 'medium', 'medium' 'big','small'],
'Oranges': ['big', 'medium', 'big'],
'Bananas':['small', ''small', 'small','small', 'big'],
'Pineapples':['small', 'big', 'big','big']
}
count_dict = {}
for key,val in df_dict.items():
count_dict[key] = dict(collections.Counter(val))
print(count_dict)
CodePudding user response:
You can do this by using counter and comprehension. dict.items() will give you the keys (Apples, Oranges...) and values (dictionary) that you need to count.
from collections import Counter
df_new = {k: dict(Counter(v)) for k, v in df_dict.items()}
Result
{'Apples': {'big': 2, 'medium': 3, 'small': 1},
'Oranges': {'big': 2, 'medium': 1},
'Bananas': {'small': 4, 'big': 1},
'Pineapples': {'small': 1, 'big': 3}}
to save this result into a file (csv):
import json
with open('file.csv', 'w') as convert_file:
convert_file.write(json.dumps(df_new))