Home > other >  Combine string in a column based on groups in a pandas DataFrame
Combine string in a column based on groups in a pandas DataFrame

Time:10-26

I have a Pandas DataFrame, DF:

Column A Column B Column C
Apple red Texas
Apple red California
Banana yellow Indiana
Banana yellow Florida

I would like to get it in a dictionary in the form:

{ "Apple red" : ['Texas', 'California'], "Banana yellow" : ['Indiana', 'Florida'] }

where Key = concatenation of strings in column A and column B (and)

Value = all corresponding strings from column C (based on groupby) in a list.

I am not sure how to achieve this.

Key Note: It should also work if there are more than 3 columns to be grouped for dictionary's key

CodePudding user response:

Try:

x = dict(
    df.groupby(df["Column A"]   " "   df["Column B"])["Column C"].agg(list)
)

print(x)

Prints:

{'Apple red': ['Texas', 'California'], 'Banana yellow': ['Indiana', 'Florida']}

CodePudding user response:

One option, which should be performant as well, is with a default dictionary:

from collections import defaultdict

out = defaultdict(list)

for a, b, c in zip(df['Column A'], df['Column B'], df['Column C']):
    key = a   " "   b
    out[key].append(c)

out
defaultdict(list,
            {'Apple red': ['Texas', 'California'],
             'Banana yellow': ['Indiana', 'Florida']})
  • Related