I'm trying to get the unique values per name1
and name2
of column prod
into a further column all_prods
. If I just group by one column (for example name1
) then I know how to do it but how does it work if I want to group by two columns (name1
and name2
)?
df = pd.DataFrame({'name1':['x','x','x','z','z'],'name2':['x','x','x','z','z'],'prod':['c','c','f','f','f']})
df.assign(**{'all_prods': df.name1.map(df.groupby('name1')['prod'].unique())})
CodePudding user response:
Doing transform
df['all_prods'] = df.groupby(['name1','name2'])['prod'].transform(lambda x : len(x)* [x.unique()])
df
Out[6]:
name1 name2 prod all_prods
0 x x c [c, f]
1 x x c [c, f]
2 x x f [c, f]
3 z z f [f]
4 z z f [f]
CodePudding user response:
Variant of your original approach:
df.merge(df.groupby(['name1', 'name2'], as_index=False)
.agg(all_prods=('prod', lambda x: list(dict.fromkeys(x)))))
output:
name1 name2 prod all_prods
0 x x c [c, f]
1 x x c [c, f]
2 x x f [c, f]
3 z z f [f]
4 z z f [f]