Home > other >  Fill col value with greatest value of matched keys within another col
Fill col value with greatest value of matched keys within another col

Time:09-02

I have an example dataframe:

example_df = {
    'parent_text': ['Pen', 'Pen', 'Pen', 'Pen', 'Pen', 'Pen', 'Pen'],
    'child_desc_text': ['orange' ,'red', 'green', 'brown', 'yellow', 'black', 'silver'],
    'cost': [0 ,1.20, 1.98, 0, 0, 0, 0],
    'uid': [12705024 ,12705087, 12705093, 12705087, 12705093, 12705093, 12705024]
    }
df = pd.DataFrame(example_df)


0   Pen orange  0.00    12705024
1   Pen red     1.20    12705087
2   Pen green   1.98    12705093
3   Pen brown   0.00    12705087
4   Pen yellow  0.00    12705093
5   Pen black   0.00    12705093
6   Pen silver  0.00    12705024

I require cost col to populated where matched in uid col (greatest value from matches, else remain 0)

Expected output:

expected_df = {
    'parent_text': ['Pen', 'Pen', 'Pen', 'Pen', 'Pen', 'Pen', 'Pen'],
    'child_desc_text': ['orange' ,'red', 'green', 'brown', 'yellow', 'black', 'silver'],
    'cost': [0 ,1.20, 1.98, 1.20, 1.98, 1.98, 0],
    'uid': [12705024 ,12705087, 12705093, 12705087, 12705093, 12705093, 12705024]
    }
df = pd.DataFrame(example_df)


0   Pen orange  0.00    12705024
1   Pen red     1.20    12705087
2   Pen green   1.98    12705093
3   Pen brown   1.20    12705087
4   Pen yellow  1.98    12705093
5   Pen black   1.98    12705093
6   Pen silver  0.00    12705024

I have tried np.where & np.select without success

CodePudding user response:

You can replace the cost column with the maximum cost of each group by uid:

df['cost'] = df.groupby('uid')['cost'].agg(max).loc[df['uid']].to_numpy()

Result:

  parent_text child_desc_text  cost       uid
0         Pen          orange  0.00  12705024
1         Pen             red  1.20  12705087
2         Pen           green  1.98  12705093
3         Pen           brown  1.20  12705087
4         Pen          yellow  1.98  12705093
5         Pen           black  1.98  12705093
6         Pen          silver  0.00  12705024

CodePudding user response:

You can groupby().transform('first') to grab the non-zero value:

df['cost'] = (df['cost'].where(df['cost'].ne(0))
              .groupby(df['uid']).transform('first')
              .fillna(df['cost'])
             )

Output:

  parent_text child_desc_text  cost       uid
0         Pen          orange  0.00  12705024
1         Pen             red  1.20  12705087
2         Pen           green  1.98  12705093
3         Pen           brown  1.20  12705087
4         Pen          yellow  1.98  12705093
5         Pen           black  1.98  12705093
6         Pen          silver  0.00  12705024
  • Related