Assign the max value of a group to all rows in the next group-CodePudding

Given the following dataframe:

col_A	col_B
1	1
1	2
1	3
2	4
2	5
2	6
3	7
3	8
3	9

How would I assign the max value of col_B from a group in col_A to all rows in the next (in ascending order) group in col_A? Output desired is as follows:

col_A	col_B	max_col_B_ from_prev_col_A_group
1	1
1	2
1	3
2	4	3
2	5	3
2	6	3
3	7	6
3	8	6
3	9	6

I figured I might be able to get the max by group, then shift the result forward and merge the values back into the source dataframe in some way. However I can't work out the "some way" bit of that solution :(

Any ideas on how to do this? I'm not wedded to my approach at all - just want the fastest solution as there are 1.5m rows and 80k discrete values in col_A.

Thanks in advance!

Code to reproduce dataframe:

df = {
    'col_A': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 3, 7: 3, 8: 3},
    'col_B': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9},
}

CodePudding user response：

Combine merge with groupby shift:

(df.merge(df.groupby("col_A").col_B.max().shift(1),
          on="col_A", how="left")
 .rename(columns={"col_B_y": "max_col_B_ from_prev_col_A_group"}))