I have a df where the category is separated by underscores.
df
fruit cat
0 apple green_heavy_pricy
1 apple heavy_cheap
2 banana yellow
3 pear green
4 banana brown_raw_yellow
...
I want to create an agg column that gathers all unique information. I tried df.groupby("fruit")["cat"].transform("unique")
. Expected Output
fruit cat agg
0 apple green_heavy_pricy green_heavy_pricy_cheap
1 apple heavy_cheap green_heavy_pricy_cheap
2 banana yellow yellow_brown_raw
3 pear green green
4 banana brown_raw_yellow yellow_brown_raw
CodePudding user response:
Use custom lambda function with dict.fromkeys
in GroupBy.transform
:
f = lambda x: '_'.join(dict.fromkeys('_'.join(x).split('_')))
#alternative solution
#f = lambda x: '_'.join(pd.unique('_'.join(x).split('_')))
#alternative2 solution
#f = lambda x: '_'.join(dict.fromkeys(y for y in x for y in y.split('-')))
df['agg'] = df.groupby("fruit")["cat"].transform(f)
print (df)
fruit cat agg
0 apple green_heavy_pricy green_heavy_pricy_cheap
1 apple heavy_cheap green_heavy_pricy_cheap
2 banana yellow yellow_brown_raw
3 pear green green
4 banana brown_raw_yellow yellow_brown_raw