Home > Enterprise >  Assign the max value of a group to all rows in the next group
Assign the max value of a group to all rows in the next group

Time:10-22

Given the following dataframe:

col_A col_B
1 1
1 2
1 3
2 4
2 5
2 6
3 7
3 8
3 9

How would I assign the max value of col_B from a group in col_A to all rows in the next (in ascending order) group in col_A? Output desired is as follows:

col_A col_B max_col_B_ from_prev_col_A_group
1 1
1 2
1 3
2 4 3
2 5 3
2 6 3
3 7 6
3 8 6
3 9 6

I figured I might be able to get the max by group, then shift the result forward and merge the values back into the source dataframe in some way. However I can't work out the "some way" bit of that solution :(

Any ideas on how to do this? I'm not wedded to my approach at all - just want the fastest solution as there are 1.5m rows and 80k discrete values in col_A.

Thanks in advance!

Code to reproduce dataframe:

df = {
    'col_A': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 3, 7: 3, 8: 3},
    'col_B': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9},
}

CodePudding user response:

Combine merge with groupby shift:

(df.merge(df.groupby("col_A").col_B.max().shift(1),
          on="col_A", how="left")
 .rename(columns={"col_B_y": "max_col_B_ from_prev_col_A_group"}))
  • Related