I have an example dataframe:
example_df = {
'parent_text': ['Pen', 'Pen', 'Pen', 'Pen', 'Pen', 'Pen', 'Pen'],
'child_desc_text': ['orange' ,'red', 'green', 'brown', 'yellow', 'black', 'silver'],
'cost': [0 ,1.20, 1.98, 0, 0, 0, 0],
'uid': [12705024 ,12705087, 12705093, 12705087, 12705093, 12705093, 12705024]
}
df = pd.DataFrame(example_df)
0 Pen orange 0.00 12705024
1 Pen red 1.20 12705087
2 Pen green 1.98 12705093
3 Pen brown 0.00 12705087
4 Pen yellow 0.00 12705093
5 Pen black 0.00 12705093
6 Pen silver 0.00 12705024
I require cost
col to populated where matched in uid
col (greatest value from matches, else remain 0)
Expected output:
expected_df = {
'parent_text': ['Pen', 'Pen', 'Pen', 'Pen', 'Pen', 'Pen', 'Pen'],
'child_desc_text': ['orange' ,'red', 'green', 'brown', 'yellow', 'black', 'silver'],
'cost': [0 ,1.20, 1.98, 1.20, 1.98, 1.98, 0],
'uid': [12705024 ,12705087, 12705093, 12705087, 12705093, 12705093, 12705024]
}
df = pd.DataFrame(example_df)
0 Pen orange 0.00 12705024
1 Pen red 1.20 12705087
2 Pen green 1.98 12705093
3 Pen brown 1.20 12705087
4 Pen yellow 1.98 12705093
5 Pen black 1.98 12705093
6 Pen silver 0.00 12705024
I have tried np.where & np.select without success
CodePudding user response:
You can replace the cost
column with the maximum cost
of each group by uid
:
df['cost'] = df.groupby('uid')['cost'].agg(max).loc[df['uid']].to_numpy()
Result:
parent_text child_desc_text cost uid
0 Pen orange 0.00 12705024
1 Pen red 1.20 12705087
2 Pen green 1.98 12705093
3 Pen brown 1.20 12705087
4 Pen yellow 1.98 12705093
5 Pen black 1.98 12705093
6 Pen silver 0.00 12705024
CodePudding user response:
You can groupby().transform('first')
to grab the non-zero value:
df['cost'] = (df['cost'].where(df['cost'].ne(0))
.groupby(df['uid']).transform('first')
.fillna(df['cost'])
)
Output:
parent_text child_desc_text cost uid
0 Pen orange 0.00 12705024
1 Pen red 1.20 12705087
2 Pen green 1.98 12705093
3 Pen brown 1.20 12705087
4 Pen yellow 1.98 12705093
5 Pen black 1.98 12705093
6 Pen silver 0.00 12705024