group by two columns and get unique column values in a list-CodePudding

I'm trying to get the unique values per name1 and name2 of column prod into a further column all_prods. If I just group by one column (for example name1) then I know how to do it but how does it work if I want to group by two columns (name1 and name2)?

df = pd.DataFrame({'name1':['x','x','x','z','z'],'name2':['x','x','x','z','z'],'prod':['c','c','f','f','f']})

df.assign(**{'all_prods': df.name1.map(df.groupby('name1')['prod'].unique())})

CodePudding user response：

Doing transform

df['all_prods'] = df.groupby(['name1','name2'])['prod'].transform(lambda x : len(x)* [x.unique()])
df
Out[6]: 
  name1 name2 prod all_prods
0     x     x    c    [c, f]
1     x     x    c    [c, f]
2     x     x    f    [c, f]
3     z     z    f       [f]
4     z     z    f       [f]

CodePudding user response：

Variant of your original approach:

df.merge(df.groupby(['name1', 'name2'], as_index=False)
           .agg(all_prods=('prod', lambda x: list(dict.fromkeys(x)))))

output:

  name1 name2 prod all_prods
0     x     x    c    [c, f]
1     x     x    c    [c, f]
2     x     x    f    [c, f]
3     z     z    f       [f]
4     z     z    f       [f]