I have a Pandas DataFrame, DF:
Column A | Column B | Column C |
---|---|---|
Apple | red | Texas |
Apple | red | California |
Banana | yellow | Indiana |
Banana | yellow | Florida |
I would like to get it in a dictionary in the form:
{ "Apple red" : ['Texas', 'California'], "Banana yellow" : ['Indiana', 'Florida'] }
where Key = concatenation of strings in column A and column B (and)
Value = all corresponding strings from column C (based on groupby) in a list.
I am not sure how to achieve this.
Key Note: It should also work if there are more than 3 columns to be grouped for dictionary's key
CodePudding user response:
Try:
x = dict(
df.groupby(df["Column A"] " " df["Column B"])["Column C"].agg(list)
)
print(x)
Prints:
{'Apple red': ['Texas', 'California'], 'Banana yellow': ['Indiana', 'Florida']}
CodePudding user response:
One option, which should be performant as well, is with a default dictionary:
from collections import defaultdict
out = defaultdict(list)
for a, b, c in zip(df['Column A'], df['Column B'], df['Column C']):
key = a " " b
out[key].append(c)
out
defaultdict(list,
{'Apple red': ['Texas', 'California'],
'Banana yellow': ['Indiana', 'Florida']})