Home > Blockchain >  Groupby drop duplicates
Groupby drop duplicates

Time:09-12

I have a df where the category is separated by underscores.

df
    fruit   cat
0   apple   green_heavy_pricy
1   apple   heavy_cheap
2   banana  yellow
3   pear    green
4   banana  brown_raw_yellow
...

I want to create an agg column that gathers all unique information. I tried df.groupby("fruit")["cat"].transform("unique"). Expected Output

    fruit   cat                 agg
0   apple   green_heavy_pricy   green_heavy_pricy_cheap
1   apple   heavy_cheap         green_heavy_pricy_cheap
2   banana  yellow              yellow_brown_raw
3   pear    green               green
4   banana  brown_raw_yellow    yellow_brown_raw        

CodePudding user response:

Use custom lambda function with dict.fromkeys in GroupBy.transform:

f = lambda x:  '_'.join(dict.fromkeys('_'.join(x).split('_')))
#alternative solution
#f = lambda x:  '_'.join(pd.unique('_'.join(x).split('_')))
#alternative2 solution
#f = lambda x:  '_'.join(dict.fromkeys(y for y in x for y in y.split('-')))
df['agg'] = df.groupby("fruit")["cat"].transform(f)
print (df)
    fruit                cat                      agg
0   apple  green_heavy_pricy  green_heavy_pricy_cheap
1   apple        heavy_cheap  green_heavy_pricy_cheap
2  banana             yellow         yellow_brown_raw
3    pear              green                    green
4  banana   brown_raw_yellow         yellow_brown_raw
  • Related